This script uses the output of “r_l_preparation.Rmd” and returns all values reported in the paper.
Load packages:
library(tidyverse) # data processing
library(brms) # bayesian models
#library(cmdstanr) # install it with: install.packages("cmdstanr", repos = c("https://mc-stan.org/r-packages/", getOption("repos"))) followed by install_cmdstan()
library(ggdist) # for plotting
# option for Bayesian regression models:
# use all available cores for parallel computing
options(mc.cores = parallel::detectCores())
## Set the script's path as working directory
#parentfolder = rstudioapi::getActiveDocumentContext()$path
#setwd(dirname(parentfolder))
#parentfolder <- getwd()
parentfolder <- "."; # assume current folder is the document folder
models <- paste0(parentfolder, '/models/')
plots <- paste0(parentfolder, '/plots/')
data <- paste0(parentfolder, '/data/')
For reproducible reporting:
packageVersion('tidyverse')
## [1] '2.0.0'
packageVersion('ggplot2')
## [1] '3.4.4'
packageVersion('brms')
## [1] '2.20.4'
#packageVersion('cmdstanr')
packageVersion('ggdist')
## [1] '3.3.1'
Load ggplot2 theme and colors:
source('theme_timo.R')
colorBlindBlack8 <- c("#000000", "#E69F00", "#56B4E9", "#009E73",
"#F0E442", "#0072B2", "#D55E00", "#CC79A7")
Load data:
web <- read_csv(paste0(data, 'web_experiment_cleaned.csv'))
web_raw <- read_csv(paste0(data, '/web_raw_trials.csv'))
field <- read_csv(paste0(data, 'field_experiment_cleaned.csv'))
field_raw <- read_csv(paste0(data, '/field_raw_trials.csv'))
First, how many participants?
nrow(web)
## [1] 903
Sex division
table(web$Sex)
##
## F M
## 681 222
Ages
summary(web$Age)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 18.00 23.00 29.00 32.92 40.00 84.00
Order division
# Counts
table(web$Order)
##
## l_first r_first
## 436 467
# Percentage
prop.table(table(web$Order)) * 100
##
## l_first r_first
## 48.2835 51.7165
First, how many languages?
web %>% count(Name) %>% nrow()
## [1] 25
Does this number correspond with the L1s?
web %>% count(Language) %>% nrow()
## [1] 25
How many families?
web %>% count(Family) %>% nrow()
## [1] 9
How many have the R/L distinction in the L1 among the languages?
web %>% count(Language, r_l_distinction_L1) %>% count(r_l_distinction_L1)
How many really use the alveolar trill in L1 among the languages?
web %>% count(Language, trill_real_L1) %>% count(trill_real_L1)
How many really have the alveolar trill in L1 as an allophone among the languages?
web %>% count(Language, trill_occ_L1) %>% count(trill_occ_L1)
What about the same questions for L2. But this will not neatly sum up to 25, due to various possible scenarios for L2 within a specific L1.
How many have the R/L distinction in the L2 among the languages?
web %>% count(Language, r_l_distinction_L2) %>% count(r_l_distinction_L2)
How many really use the alveolar trill in L2 among the languages?
web %>% count(Language, trill_real_L2) %>% count(trill_real_L2)
How many really have the alveolar trill in L2 as an allophone among the languages?
web %>% count(Language, trill_occ_L2) %>% count(trill_occ_L2)
What is the grand average congruent behavior?
mean(web$Match)
## [1] 0.8726467
87.3%
What about only among those who have L1 without the distinction?
web %>%
filter(r_l_distinction_L1 == "0") %>%
summarize(mean_match = mean(Match, na.rm = TRUE))
83.9%
What about only among those who have L1 without the distinction and no L2 that distinguishes?
web %>%
filter(r_l_distinction_L1 == "0") %>%
filter(!EnglishL2YesNo) %>%
filter(r_l_distinction_L1 == '0') %>%
summarize(mean_match = mean(Match, na.rm = TRUE))
85.1%
Compute average matching behavior per language and sort:
web_avg <- web %>%
group_by(Language) %>%
summarize(M = mean(Match)) %>%
arrange(desc(M)) %>%
mutate(percent = round(M, 2) * 100,
percent = str_c(percent, '%'))
# Show:
web_avg %>% print(n = Inf)
## # A tibble: 25 × 3
## Language M percent
## <chr> <dbl> <chr>
## 1 FI 1 100%
## 2 FR 0.982 98%
## 3 EN 0.974 97%
## 4 DE 0.953 95%
## 5 SE 0.952 95%
## 6 DK 0.944 94%
## 7 HU 0.941 94%
## 8 JP 0.927 93%
## 9 KR 0.909 91%
## 10 GR 0.905 90%
## 11 PL 0.887 89%
## 12 RU 0.872 87%
## 13 EE 0.860 86%
## 14 FA 0.857 86%
## 15 AM 0.85 85%
## 16 ZU 0.85 85%
## 17 IT 0.846 85%
## 18 TR 0.811 81%
## 19 ES 0.806 81%
## 20 GE 0.8 80%
## 21 TH 0.8 80%
## 22 PT 0.770 77%
## 23 RO 0.742 74%
## 24 AL 0.7 70%
## 25 CN 0.696 70%
Check some demographics, also to report in the paper. First, the number of participants per language:
web %>%
count(Name, sort = TRUE) %>% print(n = Inf)
## # A tibble: 25 × 2
## Name n
## <chr> <int>
## 1 German 85
## 2 Portuguese 61
## 3 French 57
## 4 Japanese 55
## 5 Polish 53
## 6 Italian 52
## 7 Russian 47
## 8 Chinese 46
## 9 Estonian 43
## 10 Greek 42
## 11 English 39
## 12 Turkish 37
## 13 Spanish 36
## 14 Hungarian 34
## 15 Romanian 31
## 16 Korean 22
## 17 Farsi 21
## 18 Swedish 21
## 19 Armenian 20
## 20 Thai 20
## 21 Zulu 20
## 22 Danish 18
## 23 Finnish 18
## 24 Georgian 15
## 25 Albanian 10
Then, the number of L1 speakers who have R/L distinction vs. who don’t:
web %>% count(r_l_distinction_L1) %>%
mutate(prop = n / sum(n),
percent = round(prop, 2) * 100,
percent = str_c(percent, '%'))
How many people do not have any L2?
sum(is.na(web$L2)) / nrow(web)
## [1] 0.1351052
# bilinguals:
1 - sum(is.na(web$L2)) / nrow(web)
## [1] 0.8648948
Check how many people knew English as their L2:
web %>% count(EnglishL2YesNo) %>%
mutate(prop = n / sum(n),
percent = round(prop, 2) * 100,
percent = str_c(percent, '%'))
Of those that don’t use a R/L distinction in their L1, how many use R/L distinction in their L2? (double-check if logic alright!)
web %>%
filter(r_l_distinction_L1 == 0) %>%
count(r_l_distinction_L2 == 1) %>%
mutate(prop = n / sum(n),
percent = round(prop, 2) * 100,
percent = str_c(percent, '%'))
How many “pure” speakers were there?, i.e., those people that 1) don’t know English, 2) don’t use an L1 with a R/L distinction, and 3) don’t know an L2 that distinguishes R/L.
web %>%
filter(r_l_distinction_L1 == 0) %>% # excludes English as well
filter(!EnglishL2YesNo) %>%
filter(r_l_distinction_L2 == 0) %>%
nrow()
## [1] 1
1 person!
Let’s check if this is correct. This gives the list of all participants for whom this applies.
web %>%
filter(r_l_distinction_L1 == 0 & !EnglishL2YesNo & r_l_distinction_L2 == 0) %>%
print(n = 50)
## # A tibble: 1 × 18
## ID Match Language Sex Age Name Script Family Autotyp_Area L2
## <chr> <dbl> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr>
## 1 JP_2040343 1 JP F 48 Japane… diffe… Japan… N Coast Asia chin…
## # ℹ 8 more variables: EnglishL2YesNo <lgl>, Order <chr>,
## # r_l_distinction_L1 <dbl>, trill_real_L1 <dbl>, trill_occ_L1 <dbl>,
## # r_l_distinction_L2 <dbl>, trill_real_L2 <dbl>, trill_occ_L2 <dbl>
Are these really “pure”? What languages do they speak?
web %>%
filter(r_l_distinction_L1 == 0) %>% # excludes English as well
filter(!EnglishL2YesNo) %>%
filter(r_l_distinction_L2 == 0) %>%
count(Language)
One Japanese speaker.
Nevertheless, let’s explore whether these also show matches?
web %>%
filter(r_l_distinction_L1 == 0) %>% # excludes English as well
filter(!EnglishL2YesNo) %>%
filter(r_l_distinction_L2 == 0) %>%
count(Match) %>%
mutate(prop = n / sum(n),
percent = round(prop, 2) * 100,
percent = str_c(percent, '%'))
Yes, similar to above 85%.
Set parameters that will be used for all MCMC sampling:
mywarmup <- 4000
myiter <- 8000
Check the distribution of scripts across families to make decisions about random effects structure:
table(web$Family, web$r_l_distinction_L1)
##
## 0 1
## Benue-Congo 20 0
## Finno-Ugric 0 95
## IE 0 593
## Japanese 55 0
## Kartvelian 0 15
## Korean 22 0
## Sino-Tibetan 46 0
## Tai-Kadai 0 20
## Turkic 0 37
Let’s make Order a factor with ‘r_first’ baseline:
web <- mutate(web,
Order = factor(Order, levels = c('r_first', 'l_first')))
Check that the order effect is approximately balanced:
web %>% count(Order) %>%
mutate(prop = n / sum(n))
web %>% count(r_l_distinction_L1) %>%
mutate(prop = n / sum(n))
# highly imbalanced: 15.8% vs 84.2%
web %>% count(trill_real_L1) %>%
mutate(prop = n / sum(n))
# ok: 58.8% vs 41.2%
web %>% count(trill_occ_L1) %>%
mutate(prop = n / sum(n))
# 100% are 1 --> excluded from the model
## And for L2, just in case
web %>% count(r_l_distinction_L2) %>%
mutate(prop = n / sum(n))
# 13.5% missing, the rest almost all (86.3%) are 1 and 0.2% are 0 ---> exclude as well
web %>% count(trill_real_L2) %>%
mutate(prop = n / sum(n))
# 13.5% missing, the rest is balanced: 53.8% vs 32.7%
web %>% count(trill_occ_L2) %>%
mutate(prop = n / sum(n))
# 13.5% missing, the rest are all (86.5%) are 1 ---> exclude as well
For now, I will just use them as they are coded.
web <- mutate(web,
order_num = ifelse(Order == 'r_first', -0.5, +0.5))
# Code them as factors:
web$r_l_distinction_L1_f <- factor(c("same", "distinct")[web$r_l_distinction_L1 + 1], levels=c("same", "distinct"));
web$trill_real_L1_f <- factor(c("no", "yes")[web$trill_real_L1 + 1], levels=c("no", "yes"));
web$trill_occ_L1_f <- factor(c("no", "yes")[web$trill_occ_L1 + 1], levels=c("no", "yes"));
web$r_l_distinction_L2_f <- factor(c("same", "distinct")[web$r_l_distinction_L2 + 1], levels=c("same", "distinct"));
web$trill_real_L2_f <- factor(c("no", "yes")[web$trill_real_L2 + 1], levels=c("no", "yes"));
web$trill_occ_L2_f <- factor(c("no", "yes")[web$trill_occ_L2 + 1], levels=c("no", "yes"));
# Various auxiliary functions:
library(parallel);
library(lme4);
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
##
## Attaching package: 'lme4'
## The following object is masked from 'package:brms':
##
## ngrps
library(performance);
library(brms);
library(bayestestR);
##
## Attaching package: 'bayestestR'
## The following object is masked from 'package:ggdist':
##
## hdi
library(ggplot2);
library(gridExtra);
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
library(ggpubr);
library(sjPlot);
## #refugeeswelcome
brms_ncores <- max(detectCores(all.tests=TRUE, logical=FALSE), 4, na.rm=TRUE); # try to use multiple cores, if present
# Verbal interpretation of Bayes factors (BF):
BF_interpretation <- function(BF, model1_name="m1", model2_name="m2")
{
if( BF > 100 ) return (paste0("extreme evidence for ",model1_name," against ",model2_name, " (BF=",sprintf("%.3g",BF),")"));
if( BF > 30 ) return (paste0("very strong evidence for ",model1_name," against ",model2_name, " (BF=",sprintf("%.3g",BF),")"));
if( BF > 10 ) return (paste0("strong evidence for ",model1_name," against ",model2_name, " (BF=",sprintf("%.3g",BF),")"));
if( BF > 3 ) return (paste0("moderate evidence for ",model1_name," against ",model2_name, " (BF=",sprintf("%.3g",BF),")"));
if( BF > 1 ) return (paste0("anecdotal evidence for ",model1_name," against ",model2_name, " (BF=",sprintf("%.3g",BF),")"));
if( BF == 1 ) return (paste0("no evidence for ",model1_name," nor ",model2_name, " (BF=",sprintf("%.3g",BF),")"));
if( BF > 0.33 ) return (paste0("anecdotal evidence for ",model2_name," against ",model1_name, " (BF=",sprintf("%.3g",BF),")"));
if( BF > 0.10 ) return (paste0("moderate evidence for ",model2_name," against ",model1_name, " (BF=",sprintf("%.3g",BF),")"));
if( BF > 0.033 ) return (paste0("strong evidence for ",model2_name," against ",model1_name, " (BF=",sprintf("%.3g",BF),")"));
if( BF > 0.010 ) return (paste0("very strong evidence for ",model2_name," against ",model1_name, " (BF=",sprintf("%.3g",BF),")"));
return (paste0("extreme evidence for ",model2_name," against ",model1_name, " (BF=",sprintf("%.3g",BF),")"));
}
# Here I hack brms' kfold code to make it run in parallel using good old mclapply instead of futures
# this avoid random crashes which seem to be due to future, but works only on *NIX (which, for me here, is not an issue)
# Adapted the code from https://github.com/paul-buerkner/brms/blob/master/R/loo.R and https://github.com/paul-buerkner/brms/blob/master/R/kfold.R
if( Sys.info()['sysname'] == "Windows" )
{
# In Windows, fall back to the stadard implementation in brms:
add_criterion_kfold_parallel <- function(model, K=10, chains=1)
{
return (add_criterion(model, criterion="kfold", K=K, chains=chains));
}
} else
{
# On anything else, try to use maclapply:
add_criterion_kfold_parallel <- function(model, K=10, chains=1)
{
model <- restructure(model);
mf <- model.frame(model);
attributes(mf)[c("terms", "brmsframe")] <- NULL;
N <- nrow(mf);
if( K > N ) return (model); # does not work in this case...
fold_type <- "random"; folds <- loo::kfold_split_random(K, N);
Ksub <- seq_len(K);
kfold_results <- mclapply(Ksub, function(k)
{
omitted <- predicted <- which(folds == k);
mf_omitted <- mf[-omitted, , drop=FALSE];
if( exists("subset_data2", envir=asNamespace("brms")) )
{
# Newer versions of brms:
model_updated <- base::suppressWarnings(update(model, newdata=mf_omitted, data2=brms:::subset_data2(model$data2, -omitted), refresh=0, chains=chains));
lppds <- log_lik(model_updated, newdata=mf[predicted, , drop=FALSE], newdata2=brms:::subset_data2(model$data2, predicted),
allow_new_levels=TRUE, resp=NULL, combine=TRUE, chains=chains);
} else if( exists("subset_autocor", envir=asNamespace("brms")) )
{
# Older versions of brms:
model2 <- brms:::subset_autocor(model, -omitted, incl_car=TRUE);
model_updated <- base::suppressWarnings(update(model2, newdata=mf_omitted, refresh=0, chains=chains));
lppds <- log_lik(model_updated, newdata=mf[predicted, , drop=FALSE], allow_new_levels=TRUE, resp=NULL, combine=TRUE, chains=chains);
} else
{
stop("Unknown version of brms!");
}
return (list("obs_order"=predicted, "lppds"=lppds));
}, mc.cores=ifelse(exists("brms_ncores"), brms_ncores, detectCores()));
# Put them back in the form expected by the the following unmodifed code:
obs_order <- lapply(kfold_results, function(x) x$obs_order);
lppds <- lapply(kfold_results, function(x) x$lppds);
elpds <- brms:::ulapply(lppds, function(x) apply(x, 2, brms:::log_mean_exp))
# make sure elpds are put back in the right order
elpds <- elpds[order(unlist(obs_order))]
elpd_kfold <- sum(elpds)
se_elpd_kfold <- sqrt(length(elpds) * var(elpds))
rnames <- c("elpd_kfold", "p_kfold", "kfoldic")
cnames <- c("Estimate", "SE")
estimates <- matrix(nrow = 3, ncol = 2, dimnames = list(rnames, cnames))
estimates[1, ] <- c(elpd_kfold, se_elpd_kfold)
estimates[3, ] <- c(-2 * elpd_kfold, 2 * se_elpd_kfold)
out <- brms:::nlist(estimates, pointwise = cbind(elpd_kfold = elpds))
atts <- brms:::nlist(K, Ksub, NULL, folds, fold_type)
attributes(out)[names(atts)] <- atts
out <- structure(out, class = c("kfold", "loo"))
attr(out, "yhash") <- brms:::hash_response(model, newdata=NULL, resp=NULL);
attr(out, "model_name") <- "";
model$criteria$kfold <- out;
model;
}
}
# Bayesian fit indices for a given model:
brms_fit_indices <- function(model, indices=c("bayes_R2", "loo", "waic", "kfold"), K=10, verbose=TRUE, do.parallel=TRUE)
{
if( "bayes_R2" %in% indices )
{
if( verbose) cat("R^2...\n");
#attr(model, "R2") <- bayes_R2(model);
model <- add_criterion(model, "bayes_R2");
} else
{
# Remove the criterion (if already there):
if( !is.null(model$criteria) && "bayes_R2" %in% names(model$criteria) ) model$criteria[[ which(names(model$criteria) == "bayes_R2") ]] <- NULL;
}
if( "loo" %in% indices )
{
if( verbose) cat("LOO...\n");
model <- add_criterion(model, "loo");
} else
{
# Remove the criterion (if already there):
if( !is.null(model$criteria) && "loo" %in% names(model$criteria) ) model$criteria[[ which(names(model$criteria) == "loo") ]] <- NULL;
}
if( "waic" %in% indices )
{
if( verbose) cat("WAIC...\n");
model <- add_criterion(model, "waic");
} else
{
# Remove the criterion (if already there):
if( !is.null(model$criteria) && "waic" %in% names(model$criteria) ) model$criteria[[ which(names(model$criteria) == "waic") ]] <- NULL;
}
if( "kfold" %in% indices )
{
if( verbose) cat(paste0("KFOLD (K=",K,")...\n"));
model1 <- NULL;
if( !do.parallel )
{
try(model1 <- add_criterion(model, "kfold", K=K, chains=1), silent=TRUE);
} else
{
try(model1 <- add_criterion_kfold_parallel(model, K=K, chains=1), silent=TRUE);
}
if( !is.null(model1) )
{
model <- model1;
} else
{
# Remove the criterion (if already there):
if( !is.null(model$criteria) && "kfold" %in% names(model$criteria) ) model$criteria[[ which(names(model$criteria) == "kfold") ]] <- NULL;
}
} else
{
# Remove the criterion (if already there):
if( !is.null(model$criteria) && "kfold" %in% names(model$criteria) ) model$criteria[[ which(names(model$criteria) == "kfold") ]] <- NULL;
}
gc();
return (model);
}
# Bayesian model comparison:
#model1 <- b_uvbm__blue
#model2 <- b_popsize__blue
brms_compare_models <- function(model1, model2, name1=NA, name2=NA, bayes_factor=TRUE, print_results=TRUE)
{
if( !is.null(model1$criteria) && "bayes_R2" %in% names(model1$criteria) && !is.null(model1$criteria$bayes_R2) &&
!is.null(model2$criteria) && "bayes_R2" %in% names(model2$criteria) && !is.null(model2$criteria$bayes_R2) )
{
R2_1_2 <- (mean(model1$criteria$bayes_R2) - mean(model2$criteria$bayes_R2));
} else
{
R2_1_2 <- NA;
}
if( bayes_factor )
{
invisible(capture.output(bf_1_2 <- brms::bayes_factor(model1, model2)$bf));
bf_interpret_1_2 <- BF_interpretation(bf_1_2, ifelse(!is.na(name1), name1, "model1"), ifelse(!is.na(name2), name2, "model2"));
}
else
{
bf_1_2 <- NULL; bf_interpret_1_2 <- NA;
}
if( !is.null(model1$criteria) && "loo" %in% names(model1$criteria) && !is.null(model1$criteria$loo) &&
!is.null(model2$criteria) && "loo" %in% names(model2$criteria) && !is.null(model2$criteria$loo) )
{
loo_1_2 <- loo_compare(model1, model2, criterion="loo", model_names=c(ifelse(!is.na(name1), name1, "model1"), ifelse(!is.na(name2), name2, "model2")));
} else
{
loo_1_2 <- NA;
}
if( !is.null(model1$criteria) && "waic" %in% names(model1$criteria) && !is.null(model1$criteria$waic) &&
!is.null(model2$criteria) && "waic" %in% names(model2$criteria) && !is.null(model2$criteria$waic) )
{
waic_1_2 <- loo_compare(model1, model2, criterion="waic", model_names=c(ifelse(!is.na(name1), name1, "model1"), ifelse(!is.na(name2), name2, "model2")));
mw_1_2 <- model_weights(model1, model2, weights="waic", model_names=c(ifelse(!is.na(name1), name1, "model1"), ifelse(!is.na(name2), name2, "model2")));
} else
{
waic_1_2 <- NA;
mw_1_2 <- NA;
}
if( !is.null(model1$criteria) && "kfold" %in% names(model1$criteria) && !is.null(model1$criteria$kfold) &&
!is.null(model2$criteria) && "kfold" %in% names(model2$criteria) && !is.null(model2$criteria$kfold) )
{
kfold_1_2 <- loo_compare(model1, model2, criterion="kfold", model_names=c(ifelse(!is.na(name1), name1, "model1"), ifelse(!is.na(name2), name2, "model2")));
} else
{
kfold_1_2 <- NA;
}
if( print_results )
{
cat(paste0("\nComparing models '",ifelse(!is.na(name1), name1, "model1"),"' and '",ifelse(!is.na(name2), name2, "model2"),"':\n\n"));
cat(paste0("\ndelta R^2 = ",sprintf("%.1f%%",100*R2_1_2),"\n\n"));
cat(bf_interpret_1_2,"\n\n");
cat("\nLOO:\n"); print(loo_1_2);
cat("\nWAIC:\n"); print(waic_1_2);
cat("\nKFOLD:\n"); print(kfold_1_2);
cat("\nModel weights (WAIC):\n"); print(mw_1_2);
cat("\n");
}
gc();
return (list("R2"=R2_1_2, "BF"=bf_1_2, "BF_interpretation"=bf_interpret_1_2, "LOO"=loo_1_2, "WAIC"=waic_1_2, "KFOLD"=kfold_1_2, "model_weights_WAIC"=mw_1_2));
}
# print model comparisons
.print.model.comparison <- function(a=NULL, a.names=NULL, b=NULL) # a is the anova and b is the Bayesian comparison (only one can be non-NULL), a.names are the mappings between the inner and user-friendly model names
{
# ANOVA:
if( !is.null(a) )
{
a <- as.data.frame(a);
if( !is.null(a.names) )
{
if( length(a.names) != nrow(a) || !all(names(a.names) %in% rownames(a)) )
{
stop("a.names do not correspond the anova model names!");
return (NULL);
}
rownames(a) <- a.names[rownames(a)];
}
i <- which.min(a$AIC);
s.a <- sprintf("%s %s %s: ΔAIC=%.1f, ΔBIC=%.1f",
rownames(a)[i],
ifelse((!is.na(a[2,"Pr(>Chisq)"]) && a[2,"Pr(>Chisq)"] <0.05) || (abs(a$AIC[1] - a$AIC[2]) > 3), ">", "≈"),
rownames(a)[3-i],
abs(a$AIC[1] - a$AIC[2]),
abs(a$BIC[1] - a$BIC[2]));
if( !is.na(a[2,"Pr(>Chisq)"]) )
{
s.a <- paste0(s.a,
sprintf(", *p*=%s", scinot(a[2,"Pr(>Chisq)"])));
}
# return value:
return (s.a);
}
# Bayesian comparison:
if( !is.null(b) )
{
s.b <- sprintf("BF: %s, ΔLOO(%s %s %s)=%.1f (%.1f), ΔWAIC(%s %s %s)=%.1f (%.1f), ΔKFOLD(%s %s %s)=%.1f (%.1f)",
# BF:
b$BF_interpretation,
# LOO:
rownames(b$LOO)[1],
ifelse(abs(b$LOO[1,1]-b$LOO[2,1]) < 4 || abs(b$LOO[1,1]-b$LOO[2,1]) < b$LOO[2,2], "≈", ifelse(abs(b$LOO[1,1]-b$LOO[2,1]) < 2*b$LOO[2,2], ">", ">>")),
rownames(b$LOO)[2],
abs(b$LOO[1,1]-b$LOO[2,1]), b$LOO[2,2],
# WAIC:
rownames(b$WAIC)[1],
ifelse(abs(b$WAIC[1,1]-b$WAIC[2,1]) < 4 || abs(b$WAIC[1,1]-b$WAIC[2,1]) < b$WAIC[2,2], "≈", ifelse(abs(b$WAIC[1,1]-b$WAIC[2,1]) < 2*b$WAIC[2,2], ">", ">>")),
rownames(b$WAIC)[2],
abs(b$WAIC[1,1]-b$WAIC[2,1]), b$WAIC[2,2],
# KFOLD:
rownames(b$KFOLD)[1],
ifelse(abs(b$KFOLD[1,1]-b$KFOLD[2,1]) < 4 || abs(b$KFOLD[1,1]-b$KFOLD[2,1]) < b$KFOLD[2,2], "≈", ifelse(abs(b$KFOLD[1,1]-b$KFOLD[2,1]) < 2*b$KFOLD[2,2], ">", ">>")),
rownames(b$KFOLD)[2],
abs(b$KFOLD[1,1]-b$KFOLD[2,1]), b$KFOLD[2,2]);
# return value:
return (s.b);
}
}
# Standard error of the mean:
std <- function(x) sd(x)/sqrt(length(x))
# Root Mean Square Error (RMSE) between observed (y) and predicted (yrep) values:
rmse <- function(y, yrep)
{
yrep_mean <- colMeans(yrep)
sqrt(mean((yrep_mean - y)^2))
}
# Log odds to probability (logistic regression):
lo2p <- function(x){ o <- exp(x); return (o/(1+o));}
# Scientific notation using Markdown conventions (inspired from https://www.r-bloggers.com/2015/03/scientific-notation-for-rlatex/):
scinot <- function(xs, digits=2, pvalue=TRUE)
{
scinot1 <- function(x)
{
sign <- "";
if(x < 0)
{
sign <- "-";
x <- -x;
}
exponent <- floor(log10(x));
if(exponent && pvalue && exponent < -3)
{
xx <- round(x / 10^exponent, digits=digits);
e <- paste0("×10^", round(exponent,0), "^");
} else
{
xx <- round(x, digits=digits+1);
e <- "";
}
paste0(sign, xx, e);
}
vapply(xs, scinot1, character(1));
}
# Escape * in a string:
escstar <- function(s)
{
gsub("*", "\\*", s, fixed=TRUE);
}
## Model fitting:
.save_models <- TRUE; # should we also save the actual models? these models are quite large on disk but allow later checks without re-running everything
if( !file.exists("./models/web_regressions_L1_summaries.rds") ||
!(.save_models & file.exists("./models/web_regressions_L1_models.rds")) )
{
# Actually fit the models and save various summaries (and potentially the actual models as well)
# It's pretty computationally expensive, especially the Bayesian ones!
# icc:
m0 <- glmer(Match ~ 1 +
(1 | Language), # (1 | Family) (1 | Autotyp_Area) have variance 0!
data = web,
family = binomial(),
control = glmerControl(optimizer="bobyqa", optCtrl=list(maxfun = 1e7)));
summary(m0); # looks good
icc(m0); # 8.6%
# check random slopes:
m_Order <- glmer(Match ~ 1 + Order +
(1 | Language), # random slope for Order results in boundary (singular) fit
data = web,
family = binomial(),
control = glmerControl(optimizer="bobyqa", optCtrl=list(maxfun = 1e7)));
summary(m_Order); # p=1.87e-05 ***
anova(m_Order, m0); # p=1.266e-05 *** ΔAIC=-17.06
m_Sex <- glmer(Match ~ 1 + Sex +
(1 | Language), # random slope for Sex results in boundary (singular) fit
data = web,
family = binomial(),
control = glmerControl(optimizer="bobyqa", optCtrl=list(maxfun = 1e7)));
summary(m_Sex); # p=0.652
anova(m_Sex, m0); # p=0.657 ΔAIC=1.8
m_Age <- glmer(Match ~ 1 + Age +
(1 + Age | Language), # random slope for Age has tiny variance (0.0006003)
data = web,
family = binomial(),
control = glmerControl(optimizer="bobyqa", optCtrl=list(maxfun = 1e7)));
summary(m_Age); # p=0.944
anova(m_Age, m0); # p=0.26 ΔAIC=1.98
m_rl <- glmer(Match ~ 1 + r_l_distinction_L1_f +
(1 + r_l_distinction_L1_f | Language), # decent variance (0.5186), all is good
data = web,
family = binomial(),
control = glmerControl(optimizer="bobyqa", optCtrl=list(maxfun = 1e7)));
summary(m_rl); # p=0.572
anova(m_rl, m0); # p=0.93 ΔAIC=5.55
m_trill <- glmer(Match ~ 1 + trill_real_L1_f +
(1 | Language), # random slope for trill_real_L1 has correlation slope-intercept 1.00 and boundary (singular) fit
data = web,
family = binomial(),
control = glmerControl(optimizer="bobyqa", optCtrl=list(maxfun = 1e7)));
summary(m_trill); # p=0.201
anova(m_trill, m0); # p=0.19 ΔAIC=0.31
# VIF:
m_vif <- glmer(Match ~ 1 + r_l_distinction_L1_f + trill_real_L1_f + Order + Sex + Age +
(1 | Language),
data = web,
family = binomial(),
control = glmerControl(optimizer="bobyqa", optCtrl=list(maxfun = 1e7)));
summary(m_vif);
performance::check_collinearity(m_vif); # all VIFs < 1.25, so OK
# L2:
m_rl2 <- glmer(Match ~ 1 + r_l_distinction_L2_f +
(1 + r_l_distinction_L2_f | Language), # decent variance (0.5123), all is good
data = web[ !is.na(web$r_l_distinction_L2_f), ],
family = binomial(),
control = glmerControl(optimizer="bobyqa", optCtrl=list(maxfun = 1e7)));
summary(m_rl2); # p=0.814
anova(m_rl2, update(m0, . ~ ., data=web[ !is.na(web$r_l_distinction_L2_f), ])); # p=0.96 ΔAIC=5.69
m_trill2 <- glmer(Match ~ 1 + trill_real_L2_f +
(1 + trill_real_L2_f | Language),
data = web[ !is.na(web$trill_real_L2_f), ],
family = binomial(),
control = glmerControl(optimizer="bobyqa", optCtrl=list(maxfun = 1e7)));
summary(m_trill2); # p=0.985
anova(m_trill2, update(m0, . ~ ., data=web[ !is.na(web$trill_real_L2_f), ])); # p=0.76 ΔAIC=4.82
# VIF:
m_vif2 <- glmer(Match ~ 1 + r_l_distinction_L1_f + trill_real_L1_f + r_l_distinction_L2_f + trill_real_L2_f + Order + Sex + Age +
(1 | Language),
data = web,
family = binomial(),
control = glmerControl(optimizer="bobyqa", optCtrl=list(maxfun = 1e7)));
summary(m_vif2);
performance::check_collinearity(m_vif2); # all VIFs < 1.25, so OK
# with brms:
# prior predictive checks:
# what priors we can set (use Order for that):
get_prior(Match ~ 1 + Order +
(1 + Order | Language) +
(1 + Order | Family) +
(1 + Order | Autotyp_Area),
data=web,
family=bernoulli(link='logit'));
# -> "sd" priors seem alright (student_t(3, 0, 2.5)), as does the "Intercept" (student_t(3, 0, 2.5)), but lkj(1) for "cor" might too accepting of extreme correlations, and (flat) for "b" is clearly not ok
# so, we keep "sd" and "Intercept" but use lkj(2) for "cor" and student_t(5, 0, 2.5) for "b"
b_priors <- brms::brm(Match ~ 1 + Order +
(1 + Order | Language) +
(1 + Order | Family) +
(1 + Order | Autotyp_Area),
data=web,
family=bernoulli(link='logit'),
prior=c(brms::set_prior("student_t(5, 0, 2.5)", class="b"),
brms::set_prior("lkj(2)", class="cor")),
sample_prior='only', # needed for prior predictive checks
seed=998, cores=brms_ncores, iter=10000, warmup=4000, thin=2, control=list(adapt_delta=0.999, max_treedepth=13));
plot_prefix <- "./plots/web_L1";
g <- arrangeGrob(pp_check(b_priors, type='stat', sta ='min') + xlab('p(match)') + ggtitle('Prior predictive distribution of minimum values'),
pp_check(b_priors, type='stat', stat='mean') + xlab('p(match)') + ggtitle('Prior predictive distribution of means'),
pp_check(b_priors, type='stat', stat='max') + xlab('p(match)') + ggtitle('Prior predictive distribution of maximum values'),
ncol=1); # seems fine but our value is a bit extreme
as_ggplot(g); ggsave(paste0(plot_prefix,"_prior_predictive_checks.jpg"), plot=g, device="jpeg", width=6, height=4*3, units="in", quality=85);
b <- b0 <- brm(Match ~ 1 +
(1 | Language) +
(1 | Family) +
(1 | Autotyp_Area),
data = web,
family=bernoulli(link='logit'),
save_pars=save_pars(all=TRUE), # needed for Bayes factors
sample_prior=TRUE, # needed for hypotheses tests
seed=998, cores=brms_ncores, iter=10000, warmup=4000, thin=2, control=list(adapt_delta=0.999, max_treedepth=13));
summary(b0); mcmc_plot(b0, type="trace"); mcmc_plot(b0); # very decent
bayestestR::hdi(b0, ci=0.95);
b0 <- brms_fit_indices(b0);
# posterior predictive checks
plot_prefix <- "./plots/web_L1_null"; b <- b0;
mcmc_plot(b, type="trace"); ggsave(paste0(plot_prefix,"_mcmctrace.jpg"), device="jpeg", width=7, height=6, units="in", quality=85);
mcmc_plot(b); ggsave(paste0(plot_prefix,"_mcmcestim.jpg"), device="jpeg", width=7, height=4, units="in", quality=85);
pp_check(b, ndraws=100) + xlab('p(match)') + ggtitle('Posterior predictive density overlay'); ggsave(paste0(plot_prefix,"_ppchecks_densoverlay.jpg"), device="jpeg", width=6, height=4, units="in", quality=85);
g <- arrangeGrob(pp_check(b, type='stat', sta ='min') + xlab('p(match)') + ggtitle('Posterior predictive distribution of minimum values'),
pp_check(b, type='stat', stat='mean') + xlab('p(match)') + ggtitle('Posterior predictive distribution of means'),
pp_check(b, type='stat', stat='max') + xlab('p(match)') + ggtitle('Posterior predictive distribution of maximum values'),
ncol=1); # seems fine (the observed, y, does fall in the predicted distributions, y_{rep} for the max, mean and min)
as_ggplot(g); ggsave(paste0(plot_prefix,"_ppchecks_min_mean_max.jpg"), plot=g, device="jpeg", width=6, height=4*3, units="in", quality=85);
b_Order <- brm(Match ~ 1 + Order +
(1 + Order | Language) +
(1 + Order | Family) +
(1 + Order | Autotyp_Area),
data = web,
family=bernoulli(link='logit'),
prior=c(brms::set_prior("student_t(5, 0, 2.5)", class="b"),
brms::set_prior("lkj(2)", class="cor")),
save_pars=save_pars(all=TRUE), # needed for Bayes factors
sample_prior=TRUE, # needed for hypotheses tests
seed=998, cores=brms_ncores, iter=10000, warmup=4000, thin=2, control=list(adapt_delta=0.999, max_treedepth=13));
summary(b_Order); mcmc_plot(b_Order, type="trace"); mcmc_plot(b_Order); # very decent
brms::hypothesis(b_Order, c("Orderl_first = 0")); # p=0.50
b_Order <- brms_fit_indices(b_Order);
brms_compare_models(b0, b_Order, "[null model]", "[+ order]"); # order clearly better
# posterior predictive checks
plot_prefix <- "./plots/web_L1_order"; b <- b_Order;
mcmc_plot(b, type="trace"); ggsave(paste0(plot_prefix,"_mcmctrace.jpg"), device="jpeg", width=7, height=6, units="in", quality=85);
mcmc_plot(b); ggsave(paste0(plot_prefix,"_mcmcestim.jpg"), device="jpeg", width=7, height=4, units="in", quality=85);
pp_check(b, ndraws=100) + xlab('p(match)') + ggtitle('Posterior predictive density overlay'); ggsave(paste0(plot_prefix,"_ppchecks_densoverlay.jpg"), device="jpeg", width=6, height=4, units="in", quality=85);
g <- arrangeGrob(pp_check(b, type='stat', sta ='min') + xlab('p(match)') + ggtitle('Posterior predictive distribution of minimum values'),
pp_check(b, type='stat', stat='mean') + xlab('p(match)') + ggtitle('Posterior predictive distribution of means'),
pp_check(b, type='stat', stat='max') + xlab('p(match)') + ggtitle('Posterior predictive distribution of maximum values'),
ncol=1); # seems fine (the observed, y, does fall in the predicted distributions, y_{rep} for the max, mean and min)
as_ggplot(g); ggsave(paste0(plot_prefix,"_ppchecks_min_mean_max.jpg"), plot=g, device="jpeg", width=6, height=4*3, units="in", quality=85);
plot_model(b, type="pred", terms="Order"); ggsave(paste0(plot_prefix,"_pred.jpg"), device="jpeg", width=4, height=4, units="in", quality=85);
b_Sex <- brm(Match ~ 1 + Sex +
(1 + Sex | Language) +
(1 + Sex | Family) +
(1 + Sex | Autotyp_Area),
data = web,
family=bernoulli(link='logit'),
prior=c(brms::set_prior("student_t(5, 0, 2.5)", class="b"),
brms::set_prior("lkj(2)", class="cor")),
save_pars=save_pars(all=TRUE), # needed for Bayes factors
sample_prior=TRUE, # needed for hypotheses tests
seed=998, cores=brms_ncores, iter=10000, warmup=4000, thin=2, control=list(adapt_delta=0.999, max_treedepth=13));
summary(b_Sex); mcmc_plot(b_Sex, type="trace"); mcmc_plot(b_Sex); # very decent
brms::hypothesis(b_Sex, c("SexM = 0")); # p=0.84
b_Sex <- brms_fit_indices(b_Sex);
brms_compare_models(b0, b_Sex, "[null model]", "[+ sex]"); # sex does not matter
# posterior predictive checks
plot_prefix <- "./plots/web_L1_sex"; b <- b_Sex;
mcmc_plot(b, type="trace"); ggsave(paste0(plot_prefix,"_mcmctrace.jpg"), device="jpeg", width=7, height=6, units="in", quality=85);
mcmc_plot(b); ggsave(paste0(plot_prefix,"_mcmcestim.jpg"), device="jpeg", width=7, height=4, units="in", quality=85);
pp_check(b, ndraws=100) + xlab('p(match)') + ggtitle('Posterior predictive density overlay'); ggsave(paste0(plot_prefix,"_ppchecks_densoverlay.jpg"), device="jpeg", width=6, height=4, units="in", quality=85);
g <- arrangeGrob(pp_check(b, type='stat', sta ='min') + xlab('p(match)') + ggtitle('Posterior predictive distribution of minimum values'),
pp_check(b, type='stat', stat='mean') + xlab('p(match)') + ggtitle('Posterior predictive distribution of means'),
pp_check(b, type='stat', stat='max') + xlab('p(match)') + ggtitle('Posterior predictive distribution of maximum values'),
ncol=1); # seems fine (the observed, y, does fall in the predicted distributions, y_{rep} for the max, mean and min)
as_ggplot(g); ggsave(paste0(plot_prefix,"_ppchecks_min_mean_max.jpg"), plot=g, device="jpeg", width=6, height=4*3, units="in", quality=85);
plot_model(b, type="pred", terms="Sex"); ggsave(paste0(plot_prefix,"_pred.jpg"), device="jpeg", width=4, height=4, units="in", quality=85);
b_Age <- brm(Match ~ 1 + Age +
(1 + Age | Language) +
(1 + Age | Family) +
(1 + Age | Autotyp_Area),
data = web,
family=bernoulli(link='logit'),
prior=c(brms::set_prior("student_t(5, 0, 2.5)", class="b"),
brms::set_prior("lkj(2)", class="cor")),
save_pars=save_pars(all=TRUE), # needed for Bayes factors
sample_prior=TRUE, # needed for hypotheses tests
seed=998, cores=brms_ncores, iter=10000, warmup=4000, thin=2, control=list(adapt_delta=0.999, max_treedepth=13));
summary(b_Age); mcmc_plot(b_Age, type="trace"); mcmc_plot(b_Age); # very decent
brms::hypothesis(b_Age, c("Age = 0")); # p=0.99
b_Age <- brms_fit_indices(b_Age);
brms_compare_models(b0, b_Age, "[null model]", "[+ age]"); # age and null seem equivalent
# posterior predictive checks
plot_prefix <- "./plots/web_L1_age"; b <- b_Age;
mcmc_plot(b, type="trace"); ggsave(paste0(plot_prefix,"_mcmctrace.jpg"), device="jpeg", width=7, height=6, units="in", quality=85);
mcmc_plot(b); ggsave(paste0(plot_prefix,"_mcmcestim.jpg"), device="jpeg", width=7, height=4, units="in", quality=85);
pp_check(b, ndraws=100) + xlab('p(match)') + ggtitle('Posterior predictive density overlay'); ggsave(paste0(plot_prefix,"_ppchecks_densoverlay.jpg"), device="jpeg", width=6, height=4, units="in", quality=85);
g <- arrangeGrob(pp_check(b, type='stat', sta ='min') + xlab('p(match)') + ggtitle('Posterior predictive distribution of minimum values'),
pp_check(b, type='stat', stat='mean') + xlab('p(match)') + ggtitle('Posterior predictive distribution of means'),
pp_check(b, type='stat', stat='max') + xlab('p(match)') + ggtitle('Posterior predictive distribution of maximum values'),
ncol=1); # seems fine (the observed, y, does fall in the predicted distributions, y_{rep} for the max, mean and min)
as_ggplot(g); ggsave(paste0(plot_prefix,"_ppchecks_min_mean_max.jpg"), plot=g, device="jpeg", width=6, height=4*3, units="in", quality=85);
plot_model(b, type="pred", terms="Age"); ggsave(paste0(plot_prefix,"_pred.jpg"), device="jpeg", width=4, height=4, units="in", quality=85);
b_lr <- brm(Match ~ 1 + r_l_distinction_L1_f +
(1 + r_l_distinction_L1_f | Language) +
(1 + r_l_distinction_L1_f | Family) +
(1 + r_l_distinction_L1_f | Autotyp_Area),
data = web,
family=bernoulli(link='logit'),
prior=c(brms::set_prior("student_t(5, 0, 2.5)", class="b"),
brms::set_prior("lkj(2)", class="cor")),
save_pars=save_pars(all=TRUE), # needed for Bayes factors
sample_prior=TRUE, # needed for hypotheses tests
seed=998, cores=brms_ncores, iter=10000, warmup=4000, thin=2, control=list(adapt_delta=0.9999, max_treedepth=13));
summary(b_lr); mcmc_plot(b_lr, type="trace"); mcmc_plot(b_lr); # very decent
brms::hypothesis(b_lr, c("r_l_distinction_L1_fdistinct = 0")); # p=0.76
b_lr <- brms_fit_indices(b_lr);
brms_compare_models(b0, b_lr, "[null model]", "[+ l/r distinction]"); # l/r distinction is worse than null
# posterior predictive checks
plot_prefix <- "./plots/web_L1_lrdistinction"; b <- b_lr;
mcmc_plot(b, type="trace"); ggsave(paste0(plot_prefix,"_mcmctrace.jpg"), device="jpeg", width=7, height=6, units="in", quality=85);
mcmc_plot(b); ggsave(paste0(plot_prefix,"_mcmcestim.jpg"), device="jpeg", width=7, height=4, units="in", quality=85);
pp_check(b, ndraws=100) + xlab('p(match)') + ggtitle('Posterior predictive density overlay'); ggsave(paste0(plot_prefix,"_ppchecks_densoverlay.jpg"), device="jpeg", width=6, height=4, units="in", quality=85);
g <- arrangeGrob(pp_check(b, type='stat', sta ='min') + xlab('p(match)') + ggtitle('Posterior predictive distribution of minimum values'),
pp_check(b, type='stat', stat='mean') + xlab('p(match)') + ggtitle('Posterior predictive distribution of means'),
pp_check(b, type='stat', stat='max') + xlab('p(match)') + ggtitle('Posterior predictive distribution of maximum values'),
ncol=1); # seems fine (the observed, y, does fall in the predicted distributions, y_{rep} for the max, mean and min)
as_ggplot(g); ggsave(paste0(plot_prefix,"_ppchecks_min_mean_max.jpg"), plot=g, device="jpeg", width=6, height=4*3, units="in", quality=85);
plot_model(b, type="pred", terms="r_l_distinction_L1_f"); ggsave(paste0(plot_prefix,"_pred.jpg"), device="jpeg", width=4, height=4, units="in", quality=85);
b_trill <- brm(Match ~ 1 + trill_real_L1_f +
(1 + trill_real_L1_f | Language) +
(1 + trill_real_L1_f | Family) +
(1 + trill_real_L1_f | Autotyp_Area),
data = web,
family=bernoulli(link='logit'),
prior=c(brms::set_prior("student_t(5, 0, 2.5)", class="b"),
brms::set_prior("lkj(2)", class="cor")),
save_pars=save_pars(all=TRUE), # needed for Bayes factors
sample_prior=TRUE, # needed for hypotheses tests
seed=998, cores=brms_ncores, iter=10000, warmup=4000, thin=2, control=list(adapt_delta=0.9999, max_treedepth=13));
summary(b_trill); mcmc_plot(b_trill, type="trace"); mcmc_plot(b_trill); # very decent
brms::hypothesis(b_trill, c("trill_real_L1_fyes = 0")); # p=0.65
b_trill <- brms_fit_indices(b_trill);
brms_compare_models(b0, b_trill, "[null model]", "[+ trill]"); # trill might be better than null
# posterior predictive checks
plot_prefix <- "./plots/web_L1_trill"; b <- b_trill;
mcmc_plot(b, type="trace"); ggsave(paste0(plot_prefix,"_mcmctrace.jpg"), device="jpeg", width=7, height=6, units="in", quality=85);
mcmc_plot(b); ggsave(paste0(plot_prefix,"_mcmcestim.jpg"), device="jpeg", width=7, height=4, units="in", quality=85);
pp_check(b, ndraws=100) + xlab('p(match)') + ggtitle('Posterior predictive density overlay'); ggsave(paste0(plot_prefix,"_ppchecks_densoverlay.jpg"), device="jpeg", width=6, height=4, units="in", quality=85);
g <- arrangeGrob(pp_check(b, type='stat', sta ='min') + xlab('p(match)') + ggtitle('Posterior predictive distribution of minimum values'),
pp_check(b, type='stat', stat='mean') + xlab('p(match)') + ggtitle('Posterior predictive distribution of means'),
pp_check(b, type='stat', stat='max') + xlab('p(match)') + ggtitle('Posterior predictive distribution of maximum values'),
ncol=1); # seems fine (the observed, y, does fall in the predicted distributions, y_{rep} for the max, mean and min)
as_ggplot(g); ggsave(paste0(plot_prefix,"_ppchecks_min_mean_max.jpg"), plot=g, device="jpeg", width=6, height=4*3, units="in", quality=85);
plot_model(b, type="pred", terms="trill_real_L1_f"); ggsave(paste0(plot_prefix,"_pred.jpg"), device="jpeg", width=4, height=4, units="in", quality=85);
# L2:
b_lr2 <- brm(Match ~ 1 + r_l_distinction_L2_f +
(1 + r_l_distinction_L2_f | Language) +
(1 + r_l_distinction_L2_f | Family) +
(1 + r_l_distinction_L2_f | Autotyp_Area),
data = web,
family=bernoulli(link='logit'),
prior=c(brms::set_prior("student_t(5, 0, 2.5)", class="b"),
brms::set_prior("lkj(2)", class="cor")),
save_pars=save_pars(all=TRUE), # needed for Bayes factors
sample_prior=TRUE, # needed for hypotheses tests
seed=998, cores=brms_ncores, iter=10000, warmup=4000, thin=2, control=list(adapt_delta=0.9999, max_treedepth=13));
summary(b_lr2); mcmc_plot(b_lr2, type="trace"); mcmc_plot(b_lr2); # very decent
brms::hypothesis(b_lr2, c("r_l_distinction_L2_fdistinct = 0")); # p=0.55
#b_lr2 <- brms_fit_indices(b_lr2);
#brms_compare_models(b0, b_lr2, "[null model]", "[+ l/r distinction in L2]"); # <-- needs refitting the null on the restricted data
# posterior predictive checks
plot_prefix <- "./plots/web_L2_lrdistinction"; b <- b_lr2;
mcmc_plot(b, type="trace"); ggsave(paste0(plot_prefix,"_mcmctrace.jpg"), device="jpeg", width=7, height=6, units="in", quality=85);
mcmc_plot(b); ggsave(paste0(plot_prefix,"_mcmcestim.jpg"), device="jpeg", width=7, height=4, units="in", quality=85);
pp_check(b, ndraws=100) + xlab('p(match)') + ggtitle('Posterior predictive density overlay'); ggsave(paste0(plot_prefix,"_ppchecks_densoverlay.jpg"), device="jpeg", width=6, height=4, units="in", quality=85);
g <- arrangeGrob(pp_check(b, type='stat', sta ='min') + xlab('p(match)') + ggtitle('Posterior predictive distribution of minimum values'),
pp_check(b, type='stat', stat='mean') + xlab('p(match)') + ggtitle('Posterior predictive distribution of means'),
pp_check(b, type='stat', stat='max') + xlab('p(match)') + ggtitle('Posterior predictive distribution of maximum values'),
ncol=1); # seems fine (the observed, y, does fall in the predicted distributions, y_{rep} for the max, mean and min)
as_ggplot(g); ggsave(paste0(plot_prefix,"_ppchecks_min_mean_max.jpg"), plot=g, device="jpeg", width=6, height=4*3, units="in", quality=85);
plot_model(b, type="pred", terms="r_l_distinction_L2_f"); ggsave(paste0(plot_prefix,"_pred.jpg"), device="jpeg", width=4, height=4, units="in", quality=85);
b_trill2 <- brm(Match ~ 1 + trill_real_L2_f +
(1 + trill_real_L2_f | Language) +
(1 + trill_real_L2_f | Family) +
(1 + trill_real_L2_f | Autotyp_Area),
data = web,
family=bernoulli(link='logit'),
prior=c(brms::set_prior("student_t(5, 0, 2.5)", class="b"),
brms::set_prior("lkj(2)", class="cor")),
save_pars=save_pars(all=TRUE), # needed for Bayes factors
sample_prior=TRUE, # needed for hypotheses tests
seed=998, cores=brms_ncores, iter=10000, warmup=4000, thin=2, control=list(adapt_delta=0.9999, max_treedepth=13));
summary(b_trill2); mcmc_plot(b_trill2, type="trace"); mcmc_plot(b_trill2); # very decent
brms::hypothesis(b_trill2, c("trill_real_L2_fyes = 0")); # p=0.77
#b_trill2 <- brms_fit_indices(b_trill2);
#brms_compare_models(b0, b_trill2, "[null model]", "[+ trill in L2]"); # trill might be better than null
# posterior predictive checks
plot_prefix <- "./plots/web_L2_trill"; b <- b_trill2;
mcmc_plot(b, type="trace"); ggsave(paste0(plot_prefix,"_mcmctrace.jpg"), device="jpeg", width=7, height=6, units="in", quality=85);
mcmc_plot(b); ggsave(paste0(plot_prefix,"_mcmcestim.jpg"), device="jpeg", width=7, height=4, units="in", quality=85);
pp_check(b, ndraws=100) + xlab('p(match)') + ggtitle('Posterior predictive density overlay'); ggsave(paste0(plot_prefix,"_ppchecks_densoverlay.jpg"), device="jpeg", width=6, height=4, units="in", quality=85);
g <- arrangeGrob(pp_check(b, type='stat', sta ='min') + xlab('p(match)') + ggtitle('Posterior predictive distribution of minimum values'),
pp_check(b, type='stat', stat='mean') + xlab('p(match)') + ggtitle('Posterior predictive distribution of means'),
pp_check(b, type='stat', stat='max') + xlab('p(match)') + ggtitle('Posterior predictive distribution of maximum values'),
ncol=1); # seems fine (the observed, y, does fall in the predicted distributions, y_{rep} for the max, mean and min)
as_ggplot(g); ggsave(paste0(plot_prefix,"_ppchecks_min_mean_max.jpg"), plot=g, device="jpeg", width=6, height=4*3, units="in", quality=85);
plot_model(b, type="pred", terms="trill_real_L2_f"); ggsave(paste0(plot_prefix,"_pred.jpg"), device="jpeg", width=4, height=4, units="in", quality=85);
## Complex models (with manual simplification):
b_L1_full <- brm(Match ~ 1 +
Order + # order (we know it has a main effect)
r_l_distinction_L1_f + trill_real_L1_f + # r/l (doesn't have a main effect) and trill (might have a main effect)
r_l_distinction_L1_f:Order + trill_real_L1_f:Order + # do they interact with order?
r_l_distinction_L1_f:trill_real_L1_f + # do they interact between them?
(1 + Order + r_l_distinction_L1_f + trill_real_L1_f + r_l_distinction_L1_f:Order + trill_real_L1_f:Order + r_l_distinction_L1_f:trill_real_L1_f | Language) + # full random effects structure (probably is an overkill)
(1 + Order + r_l_distinction_L1_f + trill_real_L1_f + r_l_distinction_L1_f:Order + trill_real_L1_f:Order + r_l_distinction_L1_f:trill_real_L1_f | Family) +
(1 + Order + r_l_distinction_L1_f + trill_real_L1_f + r_l_distinction_L1_f:Order + trill_real_L1_f:Order + r_l_distinction_L1_f:trill_real_L1_f | Autotyp_Area),
data = web,
family=bernoulli(link='logit'),
prior=c(brms::set_prior("student_t(5, 0, 2.5)", class="b"),
brms::set_prior("lkj(2)", class="cor")),
save_pars=save_pars(all=TRUE), # needed for Bayes factors
sample_prior=TRUE, # needed for hypotheses tests
seed=998, cores=brms_ncores, iter=10000, warmup=4000, thin=2, control=list(adapt_delta=0.9999, max_treedepth=13));
summary(b_L1_full); mcmc_plot(b_L1_full, type="trace", variable="^b_", regex=TRUE); mcmc_plot(b_L1_full, variable="^b_", regex=TRUE); # very decent
brms::hypothesis(b_L1_full, c("Orderl_first = 0", # p=0.43
"r_l_distinction_L1_fdistinct = 0", # p=0.72
"trill_real_L1_fyes = 0", # p=0.56
"Orderl_first:r_l_distinction_L1_fdistinct = 0", # p=0.68
"Orderl_first:trill_real_L1_fyes = 0", # p=0.67
"r_l_distinction_L1_fdistinct:trill_real_L1_fyes = 0")); # p=0.55
b_L1_full <- brms_fit_indices(b_L1_full);
brms_compare_models(b0, b_L1_full, "[null model]", "[L1 full]"); # much better than null, but the questions is why...
# posterior predictive checks
plot_prefix <- "./plots/web_L1_full"; b <- b_L1_full;
mcmc_plot(b, type="trace", variable="^b_", regex=TRUE); ggsave(paste0(plot_prefix,"_mcmctrace.jpg"), device="jpeg", width=7, height=6, units="in", quality=85);
mcmc_plot(b, variable="^b_", regex=TRUE); ggsave(paste0(plot_prefix,"_mcmcestim.jpg"), device="jpeg", width=7, height=4, units="in", quality=85);
pp_check(b, ndraws=100) + xlab('p(match)') + ggtitle('Posterior predictive density overlay'); ggsave(paste0(plot_prefix,"_ppchecks_densoverlay.jpg"), device="jpeg", width=6, height=4, units="in", quality=85);
g <- arrangeGrob(pp_check(b, type='stat', sta ='min') + xlab('p(match)') + ggtitle('Posterior predictive distribution of minimum values'),
pp_check(b, type='stat', stat='mean') + xlab('p(match)') + ggtitle('Posterior predictive distribution of means'),
pp_check(b, type='stat', stat='max') + xlab('p(match)') + ggtitle('Posterior predictive distribution of maximum values'),
ncol=1); # seems fine (the observed, y, does fall in the predicted distributions, y_{rep} for the max, mean and min)
as_ggplot(g); ggsave(paste0(plot_prefix,"_ppchecks_min_mean_max.jpg"), plot=g, device="jpeg", width=6, height=4*3, units="in", quality=85);
plot_model(b, type="pred", terms=c("trill_real_L1_f", "r_l_distinction_L1_f", "Order")); ggsave(paste0(plot_prefix,"_pred.jpg"), device="jpeg", width=4, height=4, units="in", quality=85);
# manual simplification of the full model:
# - the interactions of Order seem the least impressive, so start with them:
b_L1_full_no_trill_order <- brm(Match ~ 1 +
Order + # order (we know it has a main effect)
r_l_distinction_L1_f + trill_real_L1_f + # r/l (doesn't have a main effect) and trill (might have a main effect)
r_l_distinction_L1_f:Order + # do they interact with order?
r_l_distinction_L1_f:trill_real_L1_f + # do they interact between them?
(1 + Order + r_l_distinction_L1_f + trill_real_L1_f + r_l_distinction_L1_f:Order + r_l_distinction_L1_f:trill_real_L1_f | Language) +
(1 + Order + r_l_distinction_L1_f + trill_real_L1_f + r_l_distinction_L1_f:Order + r_l_distinction_L1_f:trill_real_L1_f | Family) +
(1 + Order + r_l_distinction_L1_f + trill_real_L1_f + r_l_distinction_L1_f:Order + r_l_distinction_L1_f:trill_real_L1_f | Autotyp_Area),
data = web,
family=bernoulli(link='logit'),
prior=c(brms::set_prior("student_t(5, 0, 2.5)", class="b"),
brms::set_prior("lkj(2)", class="cor")),
save_pars=save_pars(all=TRUE), # needed for Bayes factors
sample_prior=TRUE, # needed for hypotheses tests
seed=998, cores=brms_ncores, iter=10000, warmup=4000, thin=2, control=list(adapt_delta=0.9999, max_treedepth=13));
summary(b_L1_full_no_trill_order); mcmc_plot(b_L1_full_no_trill_order, type="trace", variable="^b_", regex=TRUE); mcmc_plot(b_L1_full_no_trill_order, variable="^b_", regex=TRUE); # very decent
brms::hypothesis(b_L1_full_no_trill_order, c("Orderl_first = 0", # p=0.42
"r_l_distinction_L1_fdistinct = 0", # p=0.72
"trill_real_L1_fyes = 0", # p=0.55
"Orderl_first:r_l_distinction_L1_fdistinct = 0", # p=0.67
"r_l_distinction_L1_fdistinct:trill_real_L1_fyes = 0")); # p=0.55
b_L1_full_no_trill_order <- brms_fit_indices(b_L1_full_no_trill_order);
brms_compare_models(b0, b_L1_full_no_trill_order, "[null model]", "[L1 full simplified]"); # much better than null
brms_compare_models(b_L1_full, b_L1_full_no_trill_order, "[L1 full]", "[L1 full simplified]"); # simplified is better than full...
# - r_l_distinction_L1_f:Order:
b_L1_full_no_trill_order_rl <- brm(Match ~ 1 +
Order + # order (we know it has a main effect)
r_l_distinction_L1_f + trill_real_L1_f + # r/l (doesn't have a main effect) and trill (might have a main effect)
r_l_distinction_L1_f:trill_real_L1_f + # do they interact between them?
(1 + Order + r_l_distinction_L1_f + trill_real_L1_f + r_l_distinction_L1_f:trill_real_L1_f | Language) +
(1 + Order + r_l_distinction_L1_f + trill_real_L1_f + r_l_distinction_L1_f:trill_real_L1_f | Family) +
(1 + Order + r_l_distinction_L1_f + trill_real_L1_f + r_l_distinction_L1_f:trill_real_L1_f | Autotyp_Area),
data = web,
family=bernoulli(link='logit'),
prior=c(brms::set_prior("student_t(5, 0, 2.5)", class="b"),
brms::set_prior("lkj(2)", class="cor")),
save_pars=save_pars(all=TRUE), # needed for Bayes factors
sample_prior=TRUE, # needed for hypotheses tests
seed=998, cores=brms_ncores, iter=10000, warmup=4000, thin=2, control=list(adapt_delta=0.9999, max_treedepth=13));
summary(b_L1_full_no_trill_order_rl); mcmc_plot(b_L1_full_no_trill_order_rl, type="trace", variable="^b_", regex=TRUE); mcmc_plot(b_L1_full_no_trill_order_rl, variable="^b_", regex=TRUE); # very decent
brms::hypothesis(b_L1_full_no_trill_order_rl, c("Orderl_first = 0", # p=0.46
"r_l_distinction_L1_fdistinct = 0", # p=0.73
"trill_real_L1_fyes = 0", # p=0.55
"r_l_distinction_L1_fdistinct:trill_real_L1_fyes = 0")); # p=0.56
b_L1_full_no_trill_order_rl <- brms_fit_indices(b_L1_full_no_trill_order_rl);
brms_compare_models(b0, b_L1_full_no_trill_order_rl, "[null model]", "[L1 full simplified]"); # much better than null
brms_compare_models(b_L1_full, b_L1_full_no_trill_order_rl, "[L1 full]", "[L1 full simplified]"); # simplified is better than full...
# - r_l_distinction_L1_f:trill_real_L1_f (aka, main effects only):
b_L1_full_main_effects <- brm(Match ~ 1 +
Order + # order (we know it has a main effect)
r_l_distinction_L1_f + trill_real_L1_f + # r/l (doesn't have a main effect) and trill (might have a main effect)
(1 + Order + r_l_distinction_L1_f + trill_real_L1_f | Language) +
(1 + Order + r_l_distinction_L1_f + trill_real_L1_f | Family) +
(1 + Order + r_l_distinction_L1_f + trill_real_L1_f | Autotyp_Area),
data = web,
family=bernoulli(link='logit'),
prior=c(brms::set_prior("student_t(5, 0, 2.5)", class="b"),
brms::set_prior("lkj(2)", class="cor")),
save_pars=save_pars(all=TRUE), # needed for Bayes factors
sample_prior=TRUE, # needed for hypotheses tests
seed=998, cores=brms_ncores, iter=10000, warmup=4000, thin=2, control=list(adapt_delta=0.9999, max_treedepth=13));
summary(b_L1_full_main_effects); mcmc_plot(b_L1_full_main_effects, type="trace", variable="^b_", regex=TRUE); mcmc_plot(b_L1_full_main_effects, variable="^b_", regex=TRUE); # very decent
brms::hypothesis(b_L1_full_main_effects, c("Orderl_first = 0", # p=0.46
"r_l_distinction_L1_fdistinct = 0", # p=0.73
"trill_real_L1_fyes = 0")); # p=0.56
b_L1_full_main_effects <- brms_fit_indices(b_L1_full_main_effects);
brms_compare_models(b0, b_L1_full_main_effects, "[null model]", "[L1 full simplified]"); # much better than null
brms_compare_models(b_L1_full, b_L1_full_main_effects, "[L1 full]", "[L1 full simplified]"); # simplified is better than full...
## save all these results for later:
# save the models( if requested):
if( .save_models )
{
web_regressions_L1_models <- list("glmer"=list("null" =m0,
"order" =m_Order,
"sex" =m_Sex,
"age" =m_Age,
"rl" =m_rl,
"trill" =m_trill,
"rl2" =m_rl2,
"trill2"=m_trill2),
"brms"=list("null" =b0,
"order" =b_Order,
"sex" =b_Sex,
"age" =b_Age,
"rl" =b_lr,
"trill" =b_trill,
"rl2" =b_lr2,
"trill2"=b_trill2));
save(web_regressions_L1_models, file="./models/web_regressions_L1_models.rds", compress="xz", compression_level=9);
}
# save the summaries (much less disk space but enough info to undertsand the results):
web_regressions_L1_summaries <- list("glmer"=list("vif" =performance::check_collinearity(m_vif),
"vif2" =performance::check_collinearity(m_vif2),
"icc" =icc(m0),
"null" =list("summary"=summary(m0),
"CI95"=confint(m0, method="Wald")),
"order" =list("summary"=summary(m_Order),
"CI95"=confint(m_Order, method="Wald"),
"anova_0"=anova(m_Order, m0)),
"sex" =list("summary"=summary(m_Sex),
"CI95"=confint(m_Sex, method="Wald"),
"anova_0"=anova(m_Sex, m0)),
"age" =list("summary"=summary(m_Age),
"CI95"=confint(m_Age, method="Wald"),
"anova_0"=anova(m_Age, m0)),
"rl" =list("summary"=summary(m_rl),
"CI95"=confint(m_rl, method="Wald"),
"anova_0"=anova(m_rl, m0)),
"trill" =list("summary"=summary(m_trill),
"CI95"=confint(m_trill, method="Wald"),
"anova_0"=anova(m_trill, m0)),
"rl2" =list("summary"=summary(m_rl2),
"CI95"=confint(m_rl2, method="Wald"),
"anova_0"=anova(m_rl2, update(m0, . ~ ., data=web[ !is.na(web$r_l_distinction_L2_f), ]))),
"trill2"=list("summary"=summary(m_trill2),
"CI95"=confint(m_trill2, method="Wald"),
"anova_0"=anova(m_trill2, update(m0, . ~ ., data=web[ !is.na(web$trill_real_L2_f), ])))),
"brms"=list("prior_pred_check"=list("plot_path"="./plots/web_L1_prior_predictive_checks.jpg"),
"null" =list("plot_prefix"="./plots/web_L1_null",
"hdi"=bayestestR::hdi(b0, ci=0.95),
"rope"=bayestestR::rope(b0, ci=0.95, ci_method="HDI", verbose=FALSE),
"fixef"=fixef(b0), "ranef"=ranef(b0),
"summary"=capture.output(summary(b0))),
"order" =list("plot_prefix"="./plots/web_L1_order",
"hdi"=bayestestR::hdi(b_Order, ci=0.95),
"hypotheses"=brms::hypothesis(b_Order, c("Orderl_first = 0")),
"rope"=bayestestR::rope(b_Order, ci=0.95, ci_method="HDI", verbose=FALSE),
"fixef"=fixef(b_Order), "ranef"=ranef(b_Order),
"summary"=capture.output(summary(b_Order)),
"cmp_0"=brms_compare_models(b0, b_Order, "[null model]", "[+ order]")),
"sex" =list("plot_prefix"="./plots/web_L1_sex",
"hdi"=bayestestR::hdi(b_Sex, ci=0.95),
"hypotheses"=brms::hypothesis(b_Sex, c("SexM = 0")),
"rope"=bayestestR::rope(b_Sex, ci=0.95, ci_method="HDI", verbose=FALSE),
"fixef"=fixef(b_Sex), "ranef"=ranef(b_Sex),
"summary"=capture.output(summary(b_Sex)),
"cmp_0"=brms_compare_models(b0, b_Sex, "[null model]", "[+ sex]")),
"age" =list("plot_prefix"="./plots/web_L1_age",
"hdi"=bayestestR::hdi(b_Age, ci=0.95),
"hypotheses"=brms::hypothesis(b_Age, c("Age = 0")),
"rope"=bayestestR::rope(b_Age, ci=0.95, ci_method="HDI", verbose=FALSE),
"fixef"=fixef(b_Age), "ranef"=ranef(b_Age),
"summary"=capture.output(summary(b_Age)),
"cmp_0"=brms_compare_models(b0, b_Age, "[null model]", "[+ age]")),
"rl" =list("plot_prefix"="./plots/web_L1_lrdistinction",
"hdi"=bayestestR::hdi(b_lr, ci=0.95),
"hypotheses"=brms::hypothesis(b_lr, c("r_l_distinction_L1_fdistinct = 0")),
"rope"=bayestestR::rope(b_lr, ci=0.95, ci_method="HDI", verbose=FALSE),
"fixef"=fixef(b_lr), "ranef"=ranef(b_lr),
"summary"=capture.output(summary(b_lr)),
"cmp_0"=brms_compare_models(b0, b_lr, "[null model]", "[+ r/l distinction]")),
"trill" =list("plot_prefix"="./plots/web_L1_trill",
"hdi"=bayestestR::hdi(b_trill, ci=0.95),
"hypotheses"=brms::hypothesis(b_trill, c("trill_real_L1_fyes = 0")),
"rope"=bayestestR::rope(b_trill, ci=0.95, ci_method="HDI", verbose=FALSE),
"fixef"=fixef(b_trill), "ranef"=ranef(b_trill),
"summary"=capture.output(summary(b_trill)),
"cmp_0"=brms_compare_models(b0, b_trill, "[null model]", "[+ trill]")),
"rl2" =list("plot_prefix"="./plots/web_L2_lrdistinction",
"hdi"=bayestestR::hdi(b_lr2, ci=0.95),
"hypotheses"=brms::hypothesis(b_lr2, c("r_l_distinction_L2_fdistinct = 0")),
"rope"=bayestestR::rope(b_lr2, ci=0.95, ci_method="HDI", verbose=FALSE),
"fixef"=fixef(b_lr2), "ranef"=ranef(b_lr2),
"summary"=capture.output(summary(b_lr2)),
"cmp_0"=NULL),
"trill2"=list("plot_prefix"="./plots/web_L2_trill",
"hdi"=bayestestR::hdi(b_trill2, ci=0.95),
"hypotheses"=brms::hypothesis(b_trill2, c("trill_real_L2_fyes = 0")),
"rope"=bayestestR::rope(b_trill2, ci=0.95, ci_method="HDI", verbose=FALSE),
"fixef"=fixef(b_trill2), "ranef"=ranef(b_trill2),
"summary"=capture.output(summary(b_trill2)),
"cmp_0"=NULL)));
save(web_regressions_L1_summaries, file="./models/web_regressions_L1_summaries.rds", compress="xz", compression_level=9);
} else
{
#if( .save_models && file.exists("./models/web_regressions_L1_models.rds") ) load("./models/web_regressions_L1_models.rds") else web_regressions_L1_models <- NULL;
if( file.exists("./models/web_regressions_L1_summaries.rds") ) load("./models/web_regressions_L1_summaries.rds") else web_regressions_L1_summaries <- NULL;
}
First, we determined the probability of a perfect match (i.e., both responses are correct) without any predictors (aka the null model).
For clarity, the null model is:
Match ~ 1 +
(1 | Language) +
(1 | Family) +
(1 | Autotyp_Area)
and we are interested in the intercept, which represents the probability of a match.
print(web_regressions_L1_summaries$glmer$null$summary);
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: Match ~ 1 + (1 | Language)
## Data: web
## Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e+07))
##
## AIC BIC logLik deviance df.resid
## 680.4 690.0 -338.2 676.4 901
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.9660 0.2730 0.3432 0.4057 0.5672
##
## Random effects:
## Groups Name Variance Std.Dev.
## Language (Intercept) 0.3111 0.5578
## Number of obs: 903, groups: Language, 25
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.0065 0.1599 12.55 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The intercept is clearly and significantly > 0 (p=4.05×10-36) and translates into a probability of a match p(match)=88.1% 95%CI [84.5%, 91.1%], much higher than 50%.
The ICC of match estimated from the null model is 8.6%.
cat(paste0(web_regressions_L1_summaries$brms$null$summary, collapse="\n"));
## Family: bernoulli
## Links: mu = logit
## Formula: Match ~ 1 + (1 | Language) + (1 | Family) + (1 | Autotyp_Area)
## Data: web (Number of observations: 903)
## Draws: 4 chains, each with iter = 10000; warmup = 4000; thin = 2;
## total post-warmup draws = 12000
##
## Group-Level Effects:
## ~Autotyp_Area (Number of levels: 6)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.42 0.38 0.02 1.37 1.00 6383 8921
##
## ~Family (Number of levels: 9)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.35 0.30 0.01 1.10 1.00 7063 8906
##
## ~Language (Number of levels: 25)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.59 0.20 0.24 1.02 1.00 6224 5212
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 1.90 0.35 1.18 2.57 1.00 9500 9082
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
The intercept is clearly and significantly > 0 (% inside ROPE = 0.0%) and translates into a probability of a match p(match)=87.0% 95%HDI [76.4%, 92.8%], much higher than 50%.
The prior predictive checks support the choice of priors:Then, we checked if any of the potential predictors adds anything to the null model. (Please note that we show only the relevant information to keep this document simple.)
In the Bayesian approach, we fitted the maximal random effects structure (x is the predictor):
Match ~ 1 + x +
(1 + x | Language) +
(1 + x | Family) +
(1 + x | Autotyp_Area)
but we had to drastically simplify it for the frequentist models to achieve convergence (as indicated in each case).
We show them both as log odds ratios and as probabilities, as appropriate. We use a tabular presentation combing the frequentist (“ML”) and Bayesian (“B”) approaches and showing, as appropriate, the estimate with its 95%CI or 95%HDI, the p-value or the proportion inside the ROPE and the p(=0) of the Bayesian formal hypothesis testing, and the model comparison versus the null as the χ2 test and ΔAIC(model - null) or the Bayes factor (BF), ΔLOO, ΔWAIC and ΔKFOLD (with the standard deviation).
Frequentist random effects is (1 | Language).
| variable | method | log odds ratio (LOR) | probability (%) | p | model comparison |
|---|---|---|---|---|---|
| intercept (α) | ML | 2.53 [2.11, 2.95] | 92.6% [89.2%, 95.0%] | 3.81×10-32 | |
| B | 2.47 [1.77, 3.18] | 92.2% [85.4%, 96.0%] | 0.0% | ||
| order (βl_first-r_first) | ML | -0.92 [-1.34, -0.50] | 28.6% [20.8%, 37.8%] | 1.87×10-5 | χ2(1)=19.06, p=1.27×10-5, ΔAIC=-17.1 |
| order (βl_first-r_first) | B | -0.92 [-2.06, 0.25] | 28.4% [11.3%, 56.3%] | pROPE=0.1%, p(β=0)=0.51 | BF: extreme evidence for [+ order] against [null model] (BF=0.000313), ΔLOO([+ order] >> [null model])=14.2 (5.9), ΔWAIC([+ order] >> [null model])=14.4 (5.9), ΔKFOLD([+ order] >> [null model])=13.3 (6.1) |
Frequentist random effects is (1 | Language).
| variable | method | log odds ratio (LOR) | probability (%) | p | model comparison |
|---|---|---|---|---|---|
| intercept (α) | ML | 2.03 [1.70, 2.37] | 88.4% [84.5%, 91.5%] | 7.34×10-32 | |
| B | 1.93 [1.21, 2.65] | 87.3% [77.0%, 93.4%] | 0.0% | ||
| sex (βM-F) | ML | -0.11 [-0.57, 0.36] | 47.3% [36.1%, 58.9%] | 0.652 | χ2(1)=0.20, p=0.657, ΔAIC=1.8 |
| sex (βM-F) | B | 0.07 [-1.09, 1.26] | 51.6% [25.2%, 78.0%] | pROPE=0.3%, p(β=0)=0.84 | BF: extreme evidence for [null model] against [+ sex] (BF=478), ΔLOO([null model] ≈ [+ sex])=3.4 (1.8), ΔWAIC([null model] ≈ [+ sex])=3.0 (1.7), ΔKFOLD([null model] ≈ [+ sex])=3.4 (2.4) |
Frequentist random effects is (1 + Age | Language).
| variable | method | log odds ratio (LOR) | probability (%) | p | model comparison |
|---|---|---|---|---|---|
| intercept (α) | ML | 2.11 [1.27, 2.95] | 89.2% [78.0%, 95.0%] | 9.23×10-7 | |
| B | 1.72 [0.12, 3.25] | 84.8% [53.0%, 96.3%] | 0.0% | ||
| age (β) | ML | -0.00 [-0.02, 0.02] | 50.0% [49.5%, 50.5%] | 0.944 | χ2(3)=4.02, p=0.26, ΔAIC=2.0 |
| age (β) | B | 0.01 [-0.04, 0.05] | 50.2% [49.0%, 51.2%] | pROPE=1.0%, p(β=0)=0.99 | BF: extreme evidence for [null model] against [+ age] (BF=6.36e+08), ΔLOO([null model] ≈ [+ age])=0.3 (1.9), ΔWAIC([null model] ≈ [+ age])=0.1 (1.9), ΔKFOLD([null model] ≈ [+ age])=1.5 (2.7) |
Frequentist random effects is
(1 + r_l_distinction_L1_f | Language).
| variable | method | log odds ratio (LOR) | probability (%) | p | model comparison |
|---|---|---|---|---|---|
| intercept (α) | ML | 1.80 [1.01, 2.59] | 85.8% [73.3%, 93.0%] | 8.07×10-6 | |
| B | 1.75 [0.42, 3.10] | 85.3% [60.4%, 95.7%] | 0.0% | ||
| r/l distinction (βdistinct-not) | ML | 0.25 [-0.61, 1.10] | 56.1% [35.2%, 75.1%] | 0.572 | χ2(3)=0.45, p=0.93, ΔAIC=5.6 |
| r/l distinction (βdistinct-not) | B | 0.07 [-1.81, 1.77] | 51.9% [14.1%, 85.5%] | pROPE=0.2%, p(β=0)=0.76 | BF: extreme evidence for [null model] against [+ r/l distinction] (BF=221), ΔLOO([null model] ≈ [+ r/l distinction])=0.7 (0.7), ΔWAIC([null model] ≈ [+ r/l distinction])=0.6 (0.7), ΔKFOLD([null model] ≈ [+ r/l distinction])=3.9 (1.9) |
Frequentist random effects is (1 | Language).
| variable | method | log odds ratio (LOR) | probability (%) | p | model comparison |
|---|---|---|---|---|---|
| intercept (α) | ML | 2.19 [1.76, 2.63] | 90.0% [85.3%, 93.3%] | 6.64×10-23 | |
| B | 2.02 [1.06, 2.93] | 88.3% [74.3%, 94.9%] | 0.0% | ||
| [r] (βyes-no) | ML | -0.40 [-1.02, 0.22] | 40.0% [26.4%, 55.4%] | 0.201 | χ2(1)=1.69, p=0.194, ΔAIC=0.3 |
| [r] (βyes-no) | B | -0.84 [-3.46, 1.79] | 30.1% [3.1%, 85.7%] | pROPE=0.1%, p(β=0)=0.65 | BF: strong evidence for [null model] against [+ trill] (BF=16.4), ΔLOO([+ trill] ≈ [null model])=1.4 (1.8), ΔWAIC([+ trill] ≈ [null model])=1.5 (1.8), ΔKFOLD([+ trill] ≈ [null model])=3.0 (2.3) |
Frequentist random effects is
(1 + r_l_distinction_L2_f | Language).
| variable | method | log odds ratio (LOR) | probability (%) | p | model comparison |
|---|---|---|---|---|---|
| intercept (α) | ML | 16.57 [-103.99, 137.13] | 100.0% [0.0%, 100.0%] | 0.788 | |
| B | 3.43 [-1.00, 8.63] | 96.9% [26.9%, 100.0%] | 0.0% | ||
| r/l distinction L2 (βdistinct-not) | ML | -14.46 [-135.01, 106.10] | 0.0% [0.0%, 100.0%] | 0.814 | χ2(3)=0.30, p=0.959, ΔAIC=5.7 |
| r/l distinction L2 (βdistinct-not) | B | -1.42 [-6.28, 3.05] | 19.5% [0.2%, 95.5%] | pROPE=0.1%, p(β=0)=0.55 | not performed due to high computational load and predictable outcome |
Frequentist random effects is
(1 + trill_real_L2_f | Language).
| variable | method | log odds ratio (LOR) | probability (%) | p | model comparison |
|---|---|---|---|---|---|
| intercept (α) | ML | 2.18 [1.74, 2.62] | 89.9% [85.1%, 93.2%] | 3.01×10-22 | |
| B | 2.06 [1.15, 3.06] | 88.7% [76.0%, 95.5%] | 0.0% | ||
| [r] (βyes-no) | ML | -0.01 [-0.65, 0.64] | 49.8% [34.3%, 65.4%] | 0.985 | χ2(3)=1.18, p=0.759, ΔAIC=4.8 |
| [r] (βyes-no) | B | 0.33 [-1.38, 2.30] | 58.1% [20.0%, 90.9%] | pROPE=0.2%, p(β=0)=0.77 | not performed due to high computational load and predictable outcome |
In all cases the intercept α is highly significant and positive, comparable with the null model at ≥ 80%, so much higher than 50%.
On the other hand, of all the potential predictors only order is formally significantly adding to the null model, with “l first” decreasing the probability of a match by ≈28%.
Overall, the frequentist methods cannot accommodate the full random effects structure. Moreover, ignoring on purpose the random slopes of “r/l distinction” and “exists [r]” artificially inflates the corresponding fixed effects.
As an extra check, we also started from a full model containing all the potential predictors and their meaningful interactions, but this simplifies as expected from the above simple regressions.
First, how many participants?
nrow(field)
## [1] 127
Sex division
table(field$Sex)
##
## f m
## 91 36
Ages
summary(field$Age)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 18.00 19.00 20.00 28.55 34.50 75.00
First, how many languages?
field %>% count(Language) %>% nrow()
## [1] 6
Does this number correspond with the L1s?
field %>% count(Name) %>% nrow()
## [1] 6
How many families?
field %>% count(Family) %>% nrow()
## [1] 4
How many have the R/L distinction in the L1 among the languages?
field %>% count(Name, r_l_distinction_L1) %>% count(r_l_distinction_L1)
How many really use the alveolar trill in L1 among the languages?
field %>% count(Name, trill_real_L1) %>% count(trill_real_L1)
How many really have the alveolar trill in L1 as an allophone among the languages?
field %>% count(Name, trill_occ_L1) %>% count(trill_occ_L1)
What about the same questions for L2. But this will not neatly sum up to 25, due to various possible scenarios for L2 within a specific L1.
How many have the R/L distinction in the L2 among the languages?
field %>% count(Name, r_l_distinction_L2) %>% count(r_l_distinction_L2)
How many really use the alveolar trill in L2 among the languages?
field %>% count(Name, trill_real_L2) %>% count(trill_real_L2)
How many really have the alveolar trill in L2 as an allophone among the languages?
field %>% count(Name, trill_occ_L2) %>% count(trill_occ_L2)
What is the grand average congruent behavior?
mean(field$Match)
## [1] 0.976378
97%!!!
What about only among those who have L1 without the distinction?
field %>%
filter(r_l_distinction_L1 == "0") %>%
summarize(mean_match = mean(Match, na.rm = TRUE))
100%! WOW.
What about only among those who have L1 without the distinction and no L2 that distinguishes?
field %>%
filter(r_l_distinction_L1 == "0") %>%
filter(!EnglishL2YesNo) %>%
filter(r_l_distinction_L2 == '0') %>%
summarize(mean_match = mean(Match, na.rm = TRUE))
There are no such people.
Compute average matching behavior per language and sort:
field_avg <- field %>%
group_by(Language) %>%
summarize(M = mean(Match)) %>%
arrange(desc(M)) %>%
mutate(percent = round(M, 2) * 100,
percent = str_c(percent, '%'))
# Show:
field_avg %>% print(n = Inf)
## # A tibble: 6 × 3
## Language M percent
## <chr> <dbl> <chr>
## 1 BR 1 100%
## 2 PA 1 100%
## 3 VA 1 100%
## 4 BE 0.982 98%
## 5 DE 0.947 95%
## 6 SR 0.923 92%
Check some demographics, also to report in the paper. First, the number of participants per language:
field %>%
count(Name, sort = TRUE) %>% print(n = Inf)
## # A tibble: 6 × 2
## Name n
## <chr> <int>
## 1 English UK 55
## 2 Tashlhiyt Berber 20
## 3 German 19
## 4 Brazilian Portuguese 13
## 5 Daakie 12
## 6 Palikur 8
Then, the number of L1 speakers who have R/L distinction vs. who don’t:
field %>% count(r_l_distinction_L1) %>%
mutate(prop = n / sum(n),
percent = round(prop, 2) * 100,
percent = str_c(percent, '%'))
How many people do not have any L2?
# raw count of no L2; raw count of all ppl
sum(is.na(field$L2)); nrow(field)
## [1] 52
## [1] 127
# percentage no L2
sum(is.na(field$L2)) / nrow(field)
## [1] 0.4094488
# percentage with L2
1 - sum(is.na(field$L2)) / nrow(field)
## [1] 0.5905512
Check how many people knew English as their L2:
field %>% count(EnglishL2YesNo) %>%
mutate(prop = n / sum(n),
percent = round(prop, 2) * 100,
percent = str_c(percent, '%'))
Of those that don’t use a R/L distinction in their L1, how many use R/L distinction in their L2? (double-check if logic alright!)
field %>%
filter(r_l_distinction_L1 == '0') %>%
count(r_l_distinction_L2 == '1') %>%
mutate(prop = n / sum(n),
percent = round(prop, 2) * 100,
percent = str_c(percent, '%'))
How many “pure” speakers were there?, i.e., those people that 1) don’t know English, 2) don’t use an L1 with a R/L distinction, and 3) don’t know an L2 that distinguishes R/L.
field %>%
filter(r_l_distinction_L1 == '0') %>% # excludes English as well
filter(!EnglishL2YesNo) %>%
filter(r_l_distinction_L2 == '0') %>%
nrow()
## [1] 0
None.
Check the distribution of scripts across families to make decisions about random effects structure:
table(field$Family, field$r_l_distinction_L1)
##
## 0 1
## Afro-Asiatic 0 20
## Arawakan 8 0
## Austronesian 0 12
## IE 0 87
@Dan & @Bodo, can you please check if we need some manipulation, like contrast-coding but weighted? Bodo, I know for bouba/kiki you did some kind of weighted modification and I’m sure that the proportions of our predictors are not balanced, see below:
field %>% count(r_l_distinction_L1) %>%
mutate(prop = n / sum(n))
# highly imbalanced
field %>% count(trill_real_L1) %>%
mutate(prop = n / sum(n))
field %>% count(trill_occ_L1) %>%
mutate(prop = n / sum(n))
# highly imbalanced
## And for L2, just in case
field %>% count(r_l_distinction_L2) %>%
mutate(prop = n / sum(n))
field %>% count(trill_real_L2) %>%
mutate(prop = n / sum(n))
field %>% count(trill_occ_L2) %>%
mutate(prop = n / sum(n))
# Code them as factors:
field$r_l_distinction_L1_f <- factor(c("same", "distinct")[field$r_l_distinction_L1 + 1], levels=c("same", "distinct"));
field$trill_real_L1_f <- factor(c("no", "yes")[field$trill_real_L1 + 1], levels=c("no", "yes"));
field$trill_occ_L1_f <- factor(c("no", "yes")[field$trill_occ_L1 + 1], levels=c("no", "yes"));
field$r_l_distinction_L2_f <- factor(c("same", "distinct")[field$r_l_distinction_L2 + 1], levels=c("same", "distinct"));
field$trill_real_L2_f <- factor(c("no", "yes")[field$trill_real_L2 + 1], levels=c("no", "yes"));
field$trill_occ_L2_f <- factor(c("no", "yes")[field$trill_occ_L2 + 1], levels=c("no", "yes"));
if( !file.exists("./models/field_regressions_summaries.rds") ||
!(.save_models & file.exists("./models/field_regressions_models.rds")) )
{
# Actually fit the models and save various summaries (and potentially the actual models as well)
# It's pretty computationally expensive, especially the Bayesian ones!
# with glmer --> cannot model random effects!
# with brms:
b0 <- brm(Match ~ 1 +
(1 | Language),
data = field,
family=bernoulli(link='logit'),
save_pars=save_pars(all=TRUE), # needed for Bayes factors
sample_prior=TRUE, # needed for hypotheses tests
seed=998, cores=brms_ncores, iter=10000, warmup=4000, thin=2, control=list(adapt_delta=0.999, max_treedepth=13));
summary(b0); mcmc_plot(b0, type="trace"); mcmc_plot(b0); # very decent
bayestestR::hdi(b0, ci=0.95);
b0 <- brms_fit_indices(b0);
# posterior predictive checks
plot_prefix <- "./plots/field_null"; b <- b0;
mcmc_plot(b, type="trace"); ggsave(paste0(plot_prefix,"_mcmctrace.jpg"), device="jpeg", width=7, height=6, units="in", quality=85);
mcmc_plot(b); ggsave(paste0(plot_prefix,"_mcmcestim.jpg"), device="jpeg", width=7, height=4, units="in", quality=85);
pp_check(b, ndraws=100) + xlab('p(match)') + ggtitle('Posterior predictive density overlay'); ggsave(paste0(plot_prefix,"_ppchecks_densoverlay.jpg"), device="jpeg", width=6, height=4, units="in", quality=85);
g <- arrangeGrob(pp_check(b, type='stat', sta ='min') + xlab('p(match)') + ggtitle('Posterior predictive distribution of minimum values'),
pp_check(b, type='stat', stat='mean') + xlab('p(match)') + ggtitle('Posterior predictive distribution of means'),
pp_check(b, type='stat', stat='max') + xlab('p(match)') + ggtitle('Posterior predictive distribution of maximum values'),
ncol=1); # seems fine (the observed, y, does fall in the predicted distributions, y_{rep} for the max, mean and min)
as_ggplot(g); ggsave(paste0(plot_prefix,"_ppchecks_min_mean_max.jpg"), plot=g, device="jpeg", width=6, height=4*3, units="in", quality=85);
b_Sex <- brm(Match ~ 1 + Sex +
(1 + Sex | Language),
data = field,
family=bernoulli(link='logit'),
prior=c(brms::set_prior("student_t(5, 0, 2.5)", class="b"),
brms::set_prior("lkj(2)", class="cor")),
save_pars=save_pars(all=TRUE), # needed for Bayes factors
sample_prior=TRUE, # needed for hypotheses tests
seed=998, cores=brms_ncores, iter=10000, warmup=4000, thin=2, control=list(adapt_delta=0.999, max_treedepth=13));
summary(b_Sex); mcmc_plot(b_Sex, type="trace"); mcmc_plot(b_Sex); # very decent
brms::hypothesis(b_Sex, c("Sexm = 0")); # p=0.84
b_Sex <- brms_fit_indices(b_Sex);
brms_compare_models(b0, b_Sex, "[null model]", "[+ sex]"); # sex does not matter
# posterior predictive checks
plot_prefix <- "./plots/field_sex"; b <- b_Sex;
mcmc_plot(b, type="trace"); ggsave(paste0(plot_prefix,"_mcmctrace.jpg"), device="jpeg", width=7, height=6, units="in", quality=85);
mcmc_plot(b); ggsave(paste0(plot_prefix,"_mcmcestim.jpg"), device="jpeg", width=7, height=4, units="in", quality=85);
pp_check(b, ndraws=100) + xlab('p(match)') + ggtitle('Posterior predictive density overlay'); ggsave(paste0(plot_prefix,"_ppchecks_densoverlay.jpg"), device="jpeg", width=6, height=4, units="in", quality=85);
g <- arrangeGrob(pp_check(b, type='stat', sta ='min') + xlab('p(match)') + ggtitle('Posterior predictive distribution of minimum values'),
pp_check(b, type='stat', stat='mean') + xlab('p(match)') + ggtitle('Posterior predictive distribution of means'),
pp_check(b, type='stat', stat='max') + xlab('p(match)') + ggtitle('Posterior predictive distribution of maximum values'),
ncol=1); # seems fine (the observed, y, does fall in the predicted distributions, y_{rep} for the max, mean and min)
as_ggplot(g); ggsave(paste0(plot_prefix,"_ppchecks_min_mean_max.jpg"), plot=g, device="jpeg", width=6, height=4*3, units="in", quality=85);
plot_model(b, type="pred", terms="Sex"); ggsave(paste0(plot_prefix,"_pred.jpg"), device="jpeg", width=4, height=4, units="in", quality=85);
b_Age <- brm(Match ~ 1 + Age +
(1 + Age | Language),
data = field,
family=bernoulli(link='logit'),
prior=c(brms::set_prior("student_t(5, 0, 2.5)", class="b"),
brms::set_prior("lkj(2)", class="cor")),
save_pars=save_pars(all=TRUE), # needed for Bayes factors
sample_prior=TRUE, # needed for hypotheses tests
seed=998, cores=brms_ncores, iter=10000, warmup=4000, thin=2, control=list(adapt_delta=0.999, max_treedepth=13));
summary(b_Age); mcmc_plot(b_Age, type="trace"); mcmc_plot(b_Age); # very decent
brms::hypothesis(b_Age, c("Age = 0")); # p=0.96
b_Age <- brms_fit_indices(b_Age);
brms_compare_models(b0, b_Age, "[null model]", "[+ age]"); # age is worse than null
# posterior predictive checks
plot_prefix <- "./plots/field_age"; b <- b_Age;
mcmc_plot(b, type="trace"); ggsave(paste0(plot_prefix,"_mcmctrace.jpg"), device="jpeg", width=7, height=6, units="in", quality=85);
mcmc_plot(b); ggsave(paste0(plot_prefix,"_mcmcestim.jpg"), device="jpeg", width=7, height=4, units="in", quality=85);
pp_check(b, ndraws=100) + xlab('p(match)') + ggtitle('Posterior predictive density overlay'); ggsave(paste0(plot_prefix,"_ppchecks_densoverlay.jpg"), device="jpeg", width=6, height=4, units="in", quality=85);
g <- arrangeGrob(pp_check(b, type='stat', sta ='min') + xlab('p(match)') + ggtitle('Posterior predictive distribution of minimum values'),
pp_check(b, type='stat', stat='mean') + xlab('p(match)') + ggtitle('Posterior predictive distribution of means'),
pp_check(b, type='stat', stat='max') + xlab('p(match)') + ggtitle('Posterior predictive distribution of maximum values'),
ncol=1); # seems fine (the observed, y, does fall in the predicted distributions, y_{rep} for the max, mean and min)
as_ggplot(g); ggsave(paste0(plot_prefix,"_ppchecks_min_mean_max.jpg"), plot=g, device="jpeg", width=6, height=4*3, units="in", quality=85);
plot_model(b, type="pred", terms="Age"); ggsave(paste0(plot_prefix,"_pred.jpg"), device="jpeg", width=4, height=4, units="in", quality=85);
b_lr <- brm(Match ~ 1 + r_l_distinction_L1_f +
(1 + r_l_distinction_L1_f | Language),
data = field,
family=bernoulli(link='logit'),
prior=c(brms::set_prior("student_t(5, 0, 2.5)", class="b"),
brms::set_prior("lkj(2)", class="cor")),
save_pars=save_pars(all=TRUE), # needed for Bayes factors
sample_prior=TRUE, # needed for hypotheses tests
seed=998, cores=brms_ncores, iter=10000, warmup=4000, thin=2, control=list(adapt_delta=0.9999, max_treedepth=13));
summary(b_lr); mcmc_plot(b_lr, type="trace"); mcmc_plot(b_lr); # very decent
brms::hypothesis(b_lr, c("r_l_distinction_L1_fdistinct = 0")); # p=0.54
b_lr <- brms_fit_indices(b_lr);
brms_compare_models(b0, b_lr, "[null model]", "[+ l/r distinction]"); # l/r distinction is worse than null
# posterior predictive checks
plot_prefix <- "./plots/field_lrdistinction"; b <- b_lr;
mcmc_plot(b, type="trace"); ggsave(paste0(plot_prefix,"_mcmctrace.jpg"), device="jpeg", width=7, height=6, units="in", quality=85);
mcmc_plot(b); ggsave(paste0(plot_prefix,"_mcmcestim.jpg"), device="jpeg", width=7, height=4, units="in", quality=85);
pp_check(b, ndraws=100) + xlab('p(match)') + ggtitle('Posterior predictive density overlay'); ggsave(paste0(plot_prefix,"_ppchecks_densoverlay.jpg"), device="jpeg", width=6, height=4, units="in", quality=85);
g <- arrangeGrob(pp_check(b, type='stat', sta ='min') + xlab('p(match)') + ggtitle('Posterior predictive distribution of minimum values'),
pp_check(b, type='stat', stat='mean') + xlab('p(match)') + ggtitle('Posterior predictive distribution of means'),
pp_check(b, type='stat', stat='max') + xlab('p(match)') + ggtitle('Posterior predictive distribution of maximum values'),
ncol=1); # seems fine (the observed, y, does fall in the predicted distributions, y_{rep} for the max, mean and min)
as_ggplot(g); ggsave(paste0(plot_prefix,"_ppchecks_min_mean_max.jpg"), plot=g, device="jpeg", width=6, height=4*3, units="in", quality=85);
plot_model(b, type="pred", terms="r_l_distinction_L1_f"); ggsave(paste0(plot_prefix,"_pred.jpg"), device="jpeg", width=4, height=4, units="in", quality=85);
b_trill <- brm(Match ~ 1 + trill_real_L1_f +
(1 + trill_real_L1_f | Language),
data = field,
family=bernoulli(link='logit'),
prior=c(brms::set_prior("student_t(5, 0, 2.5)", class="b"),
brms::set_prior("lkj(2)", class="cor")),
save_pars=save_pars(all=TRUE), # needed for Bayes factors
sample_prior=TRUE, # needed for hypotheses tests
seed=998, cores=brms_ncores, iter=10000, warmup=4000, thin=2, control=list(adapt_delta=0.9999, max_treedepth=13));
summary(b_trill); mcmc_plot(b_trill, type="trace"); mcmc_plot(b_trill); # very decent
brms::hypothesis(b_trill, c("trill_real_L1_fyes = 0")); # p=0.51
b_trill <- brms_fit_indices(b_trill);
brms_compare_models(b0, b_trill, "[null model]", "[+ trill]"); # trill is marginally better than the null
# posterior predictive checks
plot_prefix <- "./plots/field_trill"; b <- b_trill;
mcmc_plot(b, type="trace"); ggsave(paste0(plot_prefix,"_mcmctrace.jpg"), device="jpeg", width=7, height=6, units="in", quality=85);
mcmc_plot(b); ggsave(paste0(plot_prefix,"_mcmcestim.jpg"), device="jpeg", width=7, height=4, units="in", quality=85);
pp_check(b, ndraws=100) + xlab('p(match)') + ggtitle('Posterior predictive density overlay'); ggsave(paste0(plot_prefix,"_ppchecks_densoverlay.jpg"), device="jpeg", width=6, height=4, units="in", quality=85);
g <- arrangeGrob(pp_check(b, type='stat', sta ='min') + xlab('p(match)') + ggtitle('Posterior predictive distribution of minimum values'),
pp_check(b, type='stat', stat='mean') + xlab('p(match)') + ggtitle('Posterior predictive distribution of means'),
pp_check(b, type='stat', stat='max') + xlab('p(match)') + ggtitle('Posterior predictive distribution of maximum values'),
ncol=1); # seems fine (the observed, y, does fall in the predicted distributions, y_{rep} for the max, mean and min)
as_ggplot(g); ggsave(paste0(plot_prefix,"_ppchecks_min_mean_max.jpg"), plot=g, device="jpeg", width=6, height=4*3, units="in", quality=85);
plot_model(b, type="pred", terms="trill_real_L1_f"); ggsave(paste0(plot_prefix,"_pred.jpg"), device="jpeg", width=4, height=4, units="in", quality=85);
# L2:
b_lr2 <- brm(Match ~ 1 + r_l_distinction_L2_f +
(1 + r_l_distinction_L2_f | Language),
data = field,
family=bernoulli(link='logit'),
prior=c(brms::set_prior("student_t(5, 0, 2.5)", class="b"),
brms::set_prior("lkj(2)", class="cor")),
save_pars=save_pars(all=TRUE), # needed for Bayes factors
sample_prior=TRUE, # needed for hypotheses tests
seed=998, cores=brms_ncores, iter=10000, warmup=4000, thin=2, control=list(adapt_delta=0.9999, max_treedepth=13));
summary(b_lr2); mcmc_plot(b_lr2, type="trace"); mcmc_plot(b_lr2); # very decent
brms::hypothesis(b_lr2, c("r_l_distinction_L2_fdistinct = 0")); # p=0.51
#b_lr2 <- brms_fit_indices(b_lr2);
#brms_compare_models(b0, b_lr2, "[null model]", "[+ l/r distinction in L2]"); # <-- needs refitting the null on the restricted data
# posterior predictive checks
plot_prefix <- "./plots/field_L2_lrdistinction"; b <- b_lr2;
mcmc_plot(b, type="trace"); ggsave(paste0(plot_prefix,"_mcmctrace.jpg"), device="jpeg", width=7, height=6, units="in", quality=85);
mcmc_plot(b); ggsave(paste0(plot_prefix,"_mcmcestim.jpg"), device="jpeg", width=7, height=4, units="in", quality=85);
pp_check(b, ndraws=100) + xlab('p(match)') + ggtitle('Posterior predictive density overlay'); ggsave(paste0(plot_prefix,"_ppchecks_densoverlay.jpg"), device="jpeg", width=6, height=4, units="in", quality=85);
g <- arrangeGrob(pp_check(b, type='stat', sta ='min') + xlab('p(match)') + ggtitle('Posterior predictive distribution of minimum values'),
pp_check(b, type='stat', stat='mean') + xlab('p(match)') + ggtitle('Posterior predictive distribution of means'),
pp_check(b, type='stat', stat='max') + xlab('p(match)') + ggtitle('Posterior predictive distribution of maximum values'),
ncol=1); # seems fine (the observed, y, does fall in the predicted distributions, y_{rep} for the max, mean and min)
as_ggplot(g); ggsave(paste0(plot_prefix,"_ppchecks_min_mean_max.jpg"), plot=g, device="jpeg", width=6, height=4*3, units="in", quality=85);
plot_model(b, type="pred", terms="r_l_distinction_L2_f"); ggsave(paste0(plot_prefix,"_pred.jpg"), device="jpeg", width=4, height=4, units="in", quality=85);
b_trill2 <- brm(Match ~ 1 + trill_real_L2_f +
(1 + trill_real_L2_f | Language),
data = field,
family=bernoulli(link='logit'),
prior=c(brms::set_prior("student_t(5, 0, 2.5)", class="b"),
brms::set_prior("lkj(2)", class="cor")),
save_pars=save_pars(all=TRUE), # needed for Bayes factors
sample_prior=TRUE, # needed for hypotheses tests
seed=998, cores=brms_ncores, iter=10000, warmup=4000, thin=2, control=list(adapt_delta=0.9999, max_treedepth=13));
summary(b_trill2); mcmc_plot(b_trill2, type="trace"); mcmc_plot(b_trill2); # very decent
brms::hypothesis(b_trill2, c("trill_real_L2_fyes = 0")); # p=0.53
#b_trill2 <- brms_fit_indices(b_trill2);
#brms_compare_models(b0, b_trill2, "[null model]", "[+ trill in L2]"); # trill might be better than null
# posterior predictive checks
plot_prefix <- "./plots/field_L2_trill"; b <- b_trill2;
mcmc_plot(b, type="trace"); ggsave(paste0(plot_prefix,"_mcmctrace.jpg"), device="jpeg", width=7, height=6, units="in", quality=85);
mcmc_plot(b); ggsave(paste0(plot_prefix,"_mcmcestim.jpg"), device="jpeg", width=7, height=4, units="in", quality=85);
pp_check(b, ndraws=100) + xlab('p(match)') + ggtitle('Posterior predictive density overlay'); ggsave(paste0(plot_prefix,"_ppchecks_densoverlay.jpg"), device="jpeg", width=6, height=4, units="in", quality=85);
g <- arrangeGrob(pp_check(b, type='stat', sta ='min') + xlab('p(match)') + ggtitle('Posterior predictive distribution of minimum values'),
pp_check(b, type='stat', stat='mean') + xlab('p(match)') + ggtitle('Posterior predictive distribution of means'),
pp_check(b, type='stat', stat='max') + xlab('p(match)') + ggtitle('Posterior predictive distribution of maximum values'),
ncol=1); # seems fine (the observed, y, does fall in the predicted distributions, y_{rep} for the max, mean and min)
as_ggplot(g); ggsave(paste0(plot_prefix,"_ppchecks_min_mean_max.jpg"), plot=g, device="jpeg", width=6, height=4*3, units="in", quality=85);
plot_model(b, type="pred", terms="trill_real_L2_f"); ggsave(paste0(plot_prefix,"_pred.jpg"), device="jpeg", width=4, height=4, units="in", quality=85);
## save all these results for later:
# save the models( if requested):
if( .save_models )
{
field_regressions_models <- list("brms"=list("null" =b0,
"order" =b_Order,
"sex" =b_Sex,
"age" =b_Age,
"rl" =b_lr,
"trill" =b_trill,
"rl2" =b_lr2,
"trill2"=b_trill2));
save(field_regressions_models, file="./models/field_regressions_models.rds", compress="xz", compression_level=9);
}
# save the summaries (much less disk space but enough info to undertsand the results):
field_regressions_summaries <- list("brms"=list("prior_pred_check"=list("plot_path"="./plots/field_prior_predictive_checks.jpg"),
"null" =list("plot_prefix"="./plots/field_null",
"hdi"=bayestestR::hdi(b0, ci=0.95),
"rope"=bayestestR::rope(b0, ci=0.95, ci_method="HDI", verbose=FALSE),
"fixef"=fixef(b0), "ranef"=ranef(b0),
"summary"=capture.output(summary(b0))),
"sex" =list("plot_prefix"="./plots/field_sex",
"hdi"=bayestestR::hdi(b_Sex, ci=0.95),
"hypotheses"=brms::hypothesis(b_Sex, c("Sexm = 0")),
"rope"=bayestestR::rope(b_Sex, ci=0.95, ci_method="HDI", verbose=FALSE),
"fixef"=fixef(b_Sex), "ranef"=ranef(b_Sex),
"summary"=capture.output(summary(b_Sex)),
"cmp_0"=brms_compare_models(b0, b_Sex, "[null model]", "[+ sex]")),
"age" =list("plot_prefix"="./plots/field_age",
"hdi"=bayestestR::hdi(b_Age, ci=0.95),
"hypotheses"=brms::hypothesis(b_Age, c("Age = 0")),
"rope"=bayestestR::rope(b_Age, ci=0.95, ci_method="HDI", verbose=FALSE),
"fixef"=fixef(b_Age), "ranef"=ranef(b_Age),
"summary"=capture.output(summary(b_Age)),
"cmp_0"=brms_compare_models(b0, b_Age, "[null model]", "[+ age]")),
"rl" =list("plot_prefix"="./plots/field_lrdistinction",
"hdi"=bayestestR::hdi(b_lr, ci=0.95),
"hypotheses"=brms::hypothesis(b_lr, c("r_l_distinction_L1_fdistinct = 0")),
"rope"=bayestestR::rope(b_lr, ci=0.95, ci_method="HDI", verbose=FALSE),
"fixef"=fixef(b_lr), "ranef"=ranef(b_lr),
"summary"=capture.output(summary(b_lr)),
"cmp_0"=brms_compare_models(b0, b_lr, "[null model]", "[+ r/l distinction]")),
"trill" =list("plot_prefix"="./plots/field_trill",
"hdi"=bayestestR::hdi(b_trill, ci=0.95),
"hypotheses"=brms::hypothesis(b_trill, c("trill_real_L1_fyes = 0")),
"rope"=bayestestR::rope(b_trill, ci=0.95, ci_method="HDI", verbose=FALSE),
"fixef"=fixef(b_trill), "ranef"=ranef(b_trill),
"summary"=capture.output(summary(b_trill)),
"cmp_0"=brms_compare_models(b0, b_trill, "[null model]", "[+ trill]")),
"rl2" =list("plot_prefix"="./plots/field_L2_lrdistinction",
"hdi"=bayestestR::hdi(b_lr2, ci=0.95),
"hypotheses"=brms::hypothesis(b_lr2, c("r_l_distinction_L2_fdistinct = 0")),
"rope"=bayestestR::rope(b_lr2, ci=0.95, ci_method="HDI", verbose=FALSE),
"fixef"=fixef(b_lr2), "ranef"=ranef(b_lr2),
"summary"=capture.output(summary(b_lr2)),
"cmp_0"=NULL),
"trill2"=list("plot_prefix"="./plots/field_L2_trill",
"hdi"=bayestestR::hdi(b_trill2, ci=0.95),
"hypotheses"=brms::hypothesis(b_trill2, c("trill_real_L2_fyes = 0")),
"rope"=bayestestR::rope(b_trill2, ci=0.95, ci_method="HDI", verbose=FALSE),
"fixef"=fixef(b_trill2), "ranef"=ranef(b_trill2),
"summary"=capture.output(summary(b_trill2)),
"cmp_0"=NULL)));
save(field_regressions_summaries, file="./models/field_regressions_summaries.rds", compress="xz", compression_level=9);
} else
{
#if( .save_models && file.exists("./models/field_regressions_models.rds") ) load("./models/field_regressions_models.rds") else field_regressions_models <- NULL;
if( file.exists("./models/field_regressions_summaries.rds") ) load("./models/field_regressions_summaries.rds") else field_regressions_summaries <- NULL;
}
Please note that we could not fit frequentist models as the random effects structure does not converge.
First, we determined the probability of a perfect match (i.e., both responses are correct) without any predictors (aka the null model).
For clarity, the null model is:
Match ~ 1 +
(1 | Language)
and we are interested in the intercept, which represents the probability of a match.
cat(paste0(field_regressions_summaries$brms$null$summary, collapse="\n"));
## Family: bernoulli
## Links: mu = logit
## Formula: Match ~ 1 + (1 | Language)
## Data: field (Number of observations: 127)
## Draws: 4 chains, each with iter = 10000; warmup = 4000; thin = 2;
## total post-warmup draws = 12000
##
## Group-Level Effects:
## ~Language (Number of levels: 6)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.99 0.98 0.03 3.58 1.00 6502 7736
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 3.83 0.88 2.32 5.77 1.00 7124 6903
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
The intercept is clearly and significantly > 0 (% inside ROPE = 0.0%) and translates into a probability of a match p(match)=97.9% 95%HDI [90.6%, 99.7%], much higher than 50%.
The model converges well:Please note that L2 is degenerate and the models do not make any sense.
| variable | method | log odds ratio (LOR) | probability (%) | p | model comparison |
|---|---|---|---|---|---|
| intercept (α) | B | 4.06 [2.12, 6.09] | 98.3% [89.3%, 99.8%] | 0.0% | |
| sex (βM-F) | B | 0.42 [-2.62, 3.69] | 60.4% [6.8%, 97.6%] | pROPE=0.1%, p(β=0)=0.65 | BF: anecdotal evidence for [null model] against [+ sex] (BF=2.28), ΔLOO([null model] ≈ [+ sex])=0.9 (0.5), ΔWAIC([null model] ≈ [+ sex])=0.7 (0.5), ΔKFOLD([null model] ≈ [+ sex])=0.5 (0.5) |
| variable | method | log odds ratio (LOR) | probability (%) | p | model comparison |
|---|---|---|---|---|---|
| intercept (α) | B | 6.47 [2.15, 11.64] | 99.8% [89.5%, 100.0%] | 0.0% | |
| age (β) | B | -0.07 [-0.23, 0.09] | 48.3% [44.2%, 52.2%] | pROPE=1.0%, p(β=0)=0.96 | BF: extreme evidence for [null model] against [+ age] (BF=869), ΔLOO([null model] ≈ [+ age])=1.0 (1.9), ΔWAIC([null model] ≈ [+ age])=0.4 (1.9), ΔKFOLD([null model] > [+ age])=7.4 (4.2) |
| variable | method | log odds ratio (LOR) | probability (%) | p | model comparison |
|---|---|---|---|---|---|
| intercept (α) | B | 5.18 [0.82, 10.65] | 99.4% [69.5%, 100.0%] | 0.0% | |
| r/l distinction (βdistinct-not) | B | -1.26 [-6.08, 3.55] | 22.0% [0.2%, 97.2%] | pROPE=0.1%, p(β=0)=0.54 | BF: anecdotal evidence for [null model] against [+ r/l distinction] (BF=2.48), ΔLOO([null model] ≈ [+ r/l distinction])=0.6 (0.4), ΔWAIC([null model] ≈ [+ r/l distinction])=0.5 (0.4), ΔKFOLD([null model] ≈ [+ r/l distinction])=0.9 (0.5) |
| variable | method | log odds ratio (LOR) | probability (%) | p | model comparison |
|---|---|---|---|---|---|
| intercept (α) | B | 3.60 [1.86, 5.63] | 97.3% [86.5%, 99.6%] | 0.0% | |
| [r] (βyes-no) | B | 1.65 [-2.87, 6.47] | 83.9% [5.4%, 99.8%] | pROPE=0.1%, p(β=0)=0.51 | BF: anecdotal evidence for [null model] against [+ trill] (BF=1.13), ΔLOO([+ trill] ≈ [null model])=0.4 (0.2), ΔWAIC([+ trill] ≈ [null model])=0.4 (0.3), ΔKFOLD([+ trill] ≈ [null model])=0.4 (0.3) |
In all cases the intercept α is highly significant and positive, comparable with the null model at ≥ 98%, so much higher than 50%.
On the other hand, none of the potential predictors seem to make any difference.
It is clear that there is a clear and very strong association between [r] and the rugged line and between [l] and the continuous line across the board. On the other hand, no predictor seems to really affect this, except for the order of presentation (“l” first decreases the assication).
knitr::knit_exit();